Ticket #1103 (closed bug: fixed)

Opened 5 years ago

Last modified 3 years ago

Japanese Unicode

Reported by: humasect Owned by:
Priority: normal Milestone: 6.10 branch
Component: Compiler (Parser) Version: 6.6
Keywords: japanese unicode lexical -fglasgow-exts Cc: pho@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Difficulty: Unknown
Test Case: Blocked By:
Blocking: Related Tickets:

Description

Using Japanese characters (either katakana or hiragana) in identifiers rules this:

Source/Hehe.hs:12:0: lexical error at character '\12390'

There is no issue with Haskell98 for upper/lower case identifiers and type constructor identification with the two complimenting Japanese character sets. Using -fglasgow-exts along with other Unicode characters for various operators which work great.

Attachments

UniTest.hs Download (0.7 KB) - added by humasect 5 years ago.
1 working out of 3 unicode tests
CJK.hs Download (0.8 KB) - added by humasect 5 years ago.
Her is an idea/proposal for some kind of simple extension to also allow backward-compatible "international" source code. Multilingual language ?

Change History

Changed 5 years ago by humasect

  • priority changed from normal to high

Changed 5 years ago by simonmar

  • priority changed from high to normal
  • difficulty changed from Easy (1 hr) to Unknown

Please attach some example code illustrating the bug.

BTW, the "priority" field of the ticket is mainly for the GHC developers so we can prioritise tickets; please use "severity" to indicate how badly the bug affects you. Someday I'll figure out how to put a link to some docs next to these fields on the ticket page.

Changed 5 years ago by humasect

1 working out of 3 unicode tests

Changed 5 years ago by humasect

  • os changed from MacOS X to Multiple

My apologies. I've attached some test code. We could really use Japanese identifiers in house development. I don't know what to say about upper/lower case for identifiers and constructors. I could create some sort of example code for conventions that would work very well. Thanks again

Changed 5 years ago by ross

I don't think there's any reason why these characters couldn't be treated as upper or lower case letters; the question is which. We'd want to treat all the members of a Unicode General Category the same way, because special cases would be too cumbersome. A lexical syntax based on the case of letters was never going to work well with caseless scripts.

Kanji, katakana and hiragana all belong to the Letter, Other category. If we treated these as lower case, your third example would work, but you'd have to adopt a convention of prepending capital letters (like M, C, T and D) to Japanese module, class, type and data constructor names. (The same would apply to all the other caseless scripts too.)

Changed 5 years ago by igloo

Another option would be to treat them as neither upper nor lower case, so they could be part of a name but not the first character of it. I think treating them as lower case make more sense, though. Whatever we do, we should make sure Haskell' matches it.

Changed 5 years ago by simonmar

  • milestone changed from 6.6.1 to 6.8

Punt to 6.8: this requires further thought and coordination with Haskell'.

Changed 5 years ago by humasect

Her is an idea/proposal for some kind of simple extension to also allow backward-compatible "international" source code. Multilingual language ?

Changed 5 years ago by igloo

  • milestone changed from 6.8 to 6.1

Punt to 6.10 as this still requires further thought and coordination with Haskell'.

Changed 4 years ago by PHO

  • cc pho@… added

Changed 4 years ago by simonmar

  • status changed from new to closed
  • resolution set to fixed

I did as Ross suggested and made the "Letter, Other" class behave as lower-case.

Wed Jul  9 10:12:52 BST 2008  Simon Marlow <marlowsd@gmail.com>
  * Treat the Unicode "Letter, Other" class as lowercase letters (#1103)
  This is an arbitrary choice, but it's strictly more useful than the
  current situation, where these characters cannot be used in
  identifiers at all.
  
  In Haskell' we may revisit this decision (it's on my list of things to
  discuss), but for now this is an improvement for those using caseless
  languages.

Changed 3 years ago by simonmar

  • architecture changed from Unknown to Unknown/Multiple

Changed 3 years ago by simonmar

  • os changed from Multiple to Unknown/Multiple
Note: See TracTickets for help on using tickets.