Ticket #5436 (closed bug: fixed)

Opened 21 months ago

Last modified 19 months ago

text decoding doesn't use recover on eof

Reported by: judahj Owned by:
Priority: normal Milestone:
Component: Compiler Version: 7.2.1
Keywords: Cc: shelarcy@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Difficulty:
Test Case: Blocked By:
Blocking: Related Tickets:

Description

ghc-7.2.1 provides a way for TextEncodings to recover from decoding errors. However, that functionality does not work for incomplete byte sequences at the end of a file; in that case, it throws an error regardless of the recovery function. This is a problem since it makes it difficult to ensure that a program won't throw an exception on bad input.

Reproduction steps:

ghc --make GetChar.hs
ghc -e "Data.ByteString.hPut System.IO.stdout (Data.ByteString.pack [200])" | ./GetChar

where GetChar.hs is the following module:

{-# LANGUAGE RecordWildCards #-}
./GetChar
module Main where

import System.IO
import GHC.IO.Encoding
import GHC.IO.Encoding.Failure

main = do
    mkRecoveringLocaleEncoding "UTF-8" >>= hSetEncoding stdin
    getChar >>= print

mkRecoveringLocaleEncoding :: String -> IO TextEncoding
mkRecoveringLocaleEncoding name = do
    enc <- mkTextEncoding name
    return $ case enc of
        TextEncoding {..} -> TextEncoding {
                mkTextDecoder = fmap (setRecover $ recoverDecode TransliterateCodingFailure)
                                    mkTextDecoder,
                mkTextEncoder = fmap (setRecover $ recoverEncode TransliterateCodingFailure)
                                    mkTextEncoder,..
            }
  where
    setRecover r x = x { recover = r }

Result:

GetChar: <stdin>: hGetChar: invalid argument (invalid byte sequence for this encoding)

In the course of investigating the issue, I found the following comment near the definition of GHC.IO.Handle.streamEncode:

-- FIXME: we should use recover to deal with EOF, rather than always throwing an
-- IOException (ioe_invalidCharacter).

So I guess this ticket records my vote to fix that problem.

Change History

Changed 21 months ago by shelarcy

  • cc shelarcy@… added

Changed 20 months ago by batterseapower

  • status changed from new to merge

Fixed in 901edcb2bb342e7943400afe2ea6772998ecbf95, tested in d29666f681514f7554d9ca49e3d4bd42ff0d83b5

Changed 19 months ago by igloo

  • status changed from merge to closed
  • resolution set to fixed

Merged as changeset:ef7ecf82b23831805503d3a4e7ab51305d99cb2a

Note: See TracTickets for help on using tickets.