Ticket #1744 (closed merge: fixed)

Opened 6 years ago

Last modified 5 years ago

treat byte order mark as zero-width whitespace

Reported by: igloo Owned by: igloo
Priority: normal Milestone: 6.8.2
Component: Compiler (Parser) Version: 6.8
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Difficulty: Unknown
Test Case: Blocked By:
Blocking: Related Tickets:

Description

The U+FEFF ZERO WIDTH NO-BREAK SPACE Unicode character, better known as BYTE ORDER MARK (BOM), currently gives a lexical error:

$ printf '\xEF\xBB\xBF\nz = "str"\n' > z.hs
$ ghci z.hs
GHCi, version 6.8.0.20070927: http://www.haskell.org/ghc/  :? for help
Loading package base ... linking ... done.

z.hs:1:0: lexical error at character '\65279'
Failed, modules loaded: none.
Prelude> Leaving GHCi.

The character is only in categories Other and Format, not Space, but I think we should lex it as whitespace anyway (with zero width for the purposes of the layout rule). Ideally Haskell' would do likewise.

Change History

Changed 6 years ago by guest

Please note, that according to  http://en.wikipedia.org/wiki/Byte_Order_Mark the character U+FEFF is considered a BOM only if it appears as the first character of a file. In the context of UTF-8 it simply serves to identify the encoding. So there's no need to lex it as space, only to ignore it as the first character of a source file. Thanks.

Changed 6 years ago by igloo

  • milestone set to 6.10 branch

Changed 5 years ago by Porges

Can I vote for this also? Some editors insist upon inserting the BOM when working with UTF-8 source, and this bug is highly annoying.

Changed 5 years ago by simonmar

  • owner set to simonmar

I'm on it

Changed 5 years ago by simonmar

  • status changed from new to closed
  • type changed from feature request to merge
  • resolution set to fixed
  • milestone changed from 6.10 branch to 6.8.2

Fixed:

Fri Nov 30 10:11:00 GMT 2007  Simon Marlow <simonmar@microsoft.com>
  * FIX #1744: ignore the byte-order mark at the beginning of a file

Changed 5 years ago by simonmar

  • status changed from closed to reopened
  • resolution fixed deleted

Changed 5 years ago by simonmar

  • owner changed from simonmar to igloo
  • status changed from reopened to new

Changed 5 years ago by igloo

  • status changed from new to closed
  • resolution set to fixed

Merged

Changed 5 years ago by simonmar

  • architecture changed from Unknown to Unknown/Multiple

Changed 5 years ago by simonmar

  • os changed from Unknown to Unknown/Multiple
Note: See TracTickets for help on using tickets.