Ticket #5436 (closed bug: fixed)
text decoding doesn't use recover on eof
Description
ghc-7.2.1 provides a way for TextEncodings to recover from decoding errors. However, that functionality does not work for incomplete byte sequences at the end of a file; in that case, it throws an error regardless of the recovery function. This is a problem since it makes it difficult to ensure that a program won't throw an exception on bad input.
Reproduction steps:
ghc --make GetChar.hs ghc -e "Data.ByteString.hPut System.IO.stdout (Data.ByteString.pack [200])" | ./GetChar
where GetChar.hs is the following module:
{-# LANGUAGE RecordWildCards #-}
./GetChar
module Main where
import System.IO
import GHC.IO.Encoding
import GHC.IO.Encoding.Failure
main = do
mkRecoveringLocaleEncoding "UTF-8" >>= hSetEncoding stdin
getChar >>= print
mkRecoveringLocaleEncoding :: String -> IO TextEncoding
mkRecoveringLocaleEncoding name = do
enc <- mkTextEncoding name
return $ case enc of
TextEncoding {..} -> TextEncoding {
mkTextDecoder = fmap (setRecover $ recoverDecode TransliterateCodingFailure)
mkTextDecoder,
mkTextEncoder = fmap (setRecover $ recoverEncode TransliterateCodingFailure)
mkTextEncoder,..
}
where
setRecover r x = x { recover = r }
Result:
GetChar: <stdin>: hGetChar: invalid argument (invalid byte sequence for this encoding)
In the course of investigating the issue, I found the following comment near the definition of GHC.IO.Handle.streamEncode:
-- FIXME: we should use recover to deal with EOF, rather than always throwing an -- IOException (ioe_invalidCharacter).
So I guess this ticket records my vote to fix that problem.
Change History
Note: See
TracTickets for help on using
tickets.
