Ticket #5257 (closed bug: fixed)

Opened 2 years ago

Last modified 2 years ago

Calling fail on a UTF-8 encoded string (in file) causes garbage to be printed

Reported by: anthony.de.almeida.lopes Owned by:
Priority: normal Milestone: 7.2.1
Component: Runtime System Version: 7.0.2
Keywords: Cc:
Operating System: Linux Architecture: x86_64 (amd64)
Type of failure: Incorrect result at runtime Difficulty:
Test Case: Blocked By:
Blocking: Related Tickets:

Description (last modified by simonmar) (diff)

For example,

guerrilla@delta:/tmp/foo$ cat Test.hs 
module Main where

main :: IO ()
main =
    do
        putStrLn "μ"
        fail "μ"
guerrilla@delta:/tmp/foo$ ./Test 
μ
Test: user error (�)
guerrilla@delta:/tmp/foo$ ./Test 2>&1 | xxd
0000000: cebc 0a54 6573 743a 2075 7365 7220 6572  ...Test: user er
0000010: 726f 7220 28bc 290a                      ror (.).

Using either encodeString or writing it in escaped hexidecimal does work.

Change History

Changed 2 years ago by simonmar

  • status changed from new to closed
  • resolution set to fixed
  • description modified (diff)

Fixed in 7.2.1, thanks to Max's work on Unicode. (the problem was that the main exception handler converts the exception to a String and then passes it to the RTS error function using withCString, which until recently did no decoding).

Changed 2 years ago by simonmar

  • milestone set to 7.2.1

Changed 2 years ago by anthony.de.almeida.lopes

Does anyone know if the encodeString workaround will start to fail when I upgrade? Thanls.

Changed 2 years ago by batterseapower

I'm pretty sure your workaround will start to fail for non-ASCII strings. The UTF-8 encoded bytes that encodeString injects back into Chars will contain some bytes > 127 and so will be subject to another round of UTF-8 encoding when GHC encodes the String for the console.

ASCII strings should work fine either way.

Note: See TracTickets for help on using tickets.