Ticket #1079 (closed feature request: wontfix)

Opened 6 years ago

Last modified 5 years ago

refinement for GHC's support of UTF-8 encoding

Reported by: mukai@… Owned by:
Priority: normal Milestone: 6.8.2
Component: Compiler Version: 6.6
Keywords: Cc: Bulat.Ziganshin@…, id@…, shelarcy@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Difficulty: Unknown
Test Case: Blocked By:
Blocking: Related Tickets:

Description (last modified by Isaac Dupree) (diff)

From 6.6, GHC supports UTF-8 encoding in the source programs. GHC can read UTF-8 files and convert them into Unicode characters. However, there are no support to read/print them.

For example, we can compile the following program,

main = putStrLn "あ"

but we only get B, the least 8bit of the character (U+3042). Because of this incompleteness, we cannot print any non-ascii characters without converting for the case of writing Haskell codes with UTF-8. Although it is easy to write converting functions for this purpose, such converting should be supported by the compiler.

IMHO, desired approach is similar to Hugs. In Hugs, when printing non-ascii characters, it first converts the characters to UTF-8 octets and then prints them. However, with binary-mode Handle, it just print characters without convert. This behavior will be acceptable for many haskell programmers.

Change History

Changed 6 years ago by igloo

  • milestone set to 6.8

Changed 6 years ago by guest

  • cc Bulat.Ziganshin@… added

his next question will be "how i can read those unicode chars printed by putChar?" :) i think the whole problem is about new i/o library

Changed 6 years ago by Isaac Dupree

  • cc id@… added
  • description modified (diff)

This reminds me of the case like "\213\23\231" ( = '\213' : '\23' : '\231' : [] according to Report) where GHC treated multiple of them as one Unicode character. We should probably explicitly say somewhere: shape of String is UTF-32 (so that each Char the list contains is one Unicode code-point), and make that true for all the standard functions.

Even if we assume the standard I/O uses UTF-8 (it has to, for ASCII compatibility), if String is in practice also used for binary data (is it?), the only compatible way might be to bring in a new I/O library as Bulat says. For me, I would like Prelude input and output functions to use UTF-8 as the external format.

Changed 6 years ago by guest

  • cc shelarcy@… added

Changed 6 years ago by simonmar

  • status changed from new to closed
  • resolution set to wontfix

This is part of the much larger issue of how to support Unicode in the I/O library, which is unclear.

Changed 6 years ago by igloo

  • milestone changed from 6.8 branch to 6.8.2

Changed 5 years ago by simonmar

  • architecture changed from Unknown to Unknown/Multiple

Changed 5 years ago by simonmar

  • os changed from Unknown to Unknown/Multiple
Note: See TracTickets for help on using tickets.