Ticket #3865 (closed bug: invalid)

Opened 3 years ago

Last modified 3 years ago

On amd64, reading attached file gives "hGetContents: invalid argument (Invalid or incomplete multibyte or wide character)"

Reported by: dsf Owned by:
Priority: normal Milestone:
Component: Compiler Version: 6.13
Keywords: Cc:
Operating System: Linux Architecture: x86_64 (amd64)
Type of failure: None/Unknown Difficulty:
Test Case: Blocked By:
Blocking: Related Tickets:

Description

This error seems to be related somehow to locale, but it never happens on the i386 so I'm a bit mystified. I had another file which got the error in a build environment that didn't have any files in /usr/lib/locales, but worked when those files were there. The attached file fails whether the files are in /usr/lib/locales or not. Just read it to get the error:

readFile "testinput" >>= putStr

Hopefully it is reproducible out there and isn't related to our build. We are using ghc 6.13-20091231.

Attachments

testinput Download (14.1 KB) - added by dsf 3 years ago.
A file from haskell-hsx that can't be read by ghc 6.13-20091231 on an amd64

Change History

Changed 3 years ago by dsf

A file from haskell-hsx that can't be read by ghc 6.13-20091231 on an amd64

Changed 3 years ago by simonmar

  • status changed from new to closed
  • resolution set to invalid

The file is encoded in ISO8859-1 (aka Latin-1), so in order to read it you either need to use a Latin-1 locale, or explicitly set the encoding using hSetEncoding h latin1. Presumably on your x86-64 system the locale is set to UTF-8, but on your i386 system it is set to Latin-1.

Changed 3 years ago by dsf

Oh, I didn't realize that î was in latin1 (0xee.) Hmm, the error is coming from hscolour, I'm concerned it might run into UTF-8 files it can't handle if I change the locale to Latin-1. Maybe I'll change the file encoding to UTF-8.

Should hscolour be able to handle files of either encoding? Should it be using openBinaryFile?

Changed 3 years ago by malcolm.wallace@…

hscolour-1.16 contains a bugfix for encoding issues: when built with ghc-6.12.1, it now forces the input and output encodings to be UTF-8. This matches ghc's behaviour, which is also to insist that Haskell source files must be encoded in UTF-8.

Changed 3 years ago by dsf

Ok, I get it now. There were some places where LANG was explicitly unset in our autobuilder, I removed them and things are working smoothly now.

Note: See TracTickets for help on using tickets.