Ticket #3837 (closed bug: fixed)

Opened 3 years ago

Last modified 2 years ago

hsc2hs and utf-8

Reported by: TaruKarttunen Owned by: simonmar
Priority: normal Milestone: 7.0.2
Component: hsc2hs Version: 6.12.2
Keywords: Cc: pho@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Difficulty:
Test Case: Blocked By:
Blocking: Related Tickets:

Description (last modified by igloo) (diff)

hsc2hs is broken on a source file containing certain utf-8 characters in the comments.

Attached is an example module.

The probable culprit is:

    showCChar c    = ['\\',
                      intToDigit (ord c `quot` 64),
                      intToDigit (ord c `quot` 8 `mod` 8),
                      intToDigit (ord c          `mod` 8)]

in hsc2hs code.

Attachments

A.hsc Download (7 bytes) - added by TaruKarttunen 3 years ago.

Change History

Changed 3 years ago by TaruKarttunen

Changed 3 years ago by TaruKarttunen

Not sure if the encoding got preserved.

It is simply:

oz:HsOpenSSL$ od -t x1 A.hsc 0000000 2d 2d 20 e3 82 a8 0a 0000007

japanese: U+30A8 エ e3 82 a8 KATAKANA LETTER E

Changed 3 years ago by TaruKarttunen

  • component changed from Compiler to hsc2hs

Changed 3 years ago by PHO

  • cc pho@… added

Changed 3 years ago by igloo

  • description modified (diff)

Changed 3 years ago by igloo

  • milestone set to 6.12.2

Changed 3 years ago by simonmar

  • owner set to simonmar
  • status changed from new to assigned

Changed 3 years ago by simonmar

  • owner changed from simonmar to igloo
  • status changed from assigned to new
  • type changed from bug to merge

Fixed:

Fri Mar 19 21:56:57 GMT 2010  Simon Marlow <marlowsd@gmail.com>
  * Process the file in binary mode so we pass though UTF-8 (#3837)
  Maybe strictly speaking it would be better to encode/decode UTF-8, but
  it would be a fiddle and I don't think it really matters for hsc2hs.

Mon Mar 22 13:55:55 GMT 2010  Simon Marlow <marlowsd@gmail.com>
  * add a comment

Changed 3 years ago by igloo

  • status changed from new to closed
  • resolution set to fixed

Merged.

Changed 2 years ago by awson

  • owner igloo deleted
  • status changed from closed to new
  • version changed from 6.12.1 to 6.12.2
  • resolution fixed deleted

Now it's broken more badly. For example:

#{test З}

where З here is UTF-8 coded char 1047, gives "<stdout>: commitBuffer: invalid argument (character is not in the code page)" error. I believe any UTF-8 coded char > 127 gives this error.

Changed 2 years ago by simonmar

  • owner set to simonmar
  • type changed from merge to bug
  • milestone changed from 6.12.2 to 7.0.2

Changed 2 years ago by simonmar

  • status changed from new to merge

Fixed:

Wed Dec 22 01:14:55 PST 2010  Simon Marlow <marlowsd@gmail.com>
  * write the output file in binary mode (#3837)

Changed 2 years ago by igloo

  • status changed from merge to closed
  • resolution set to fixed

Merged.

Note: See TracTickets for help on using tickets.