id	summary	reporter	owner	description	type	status	priority	milestone	component	version	resolution	keywords	cc	os	architecture	failure	difficulty	testcase	blockedby	blocking	related
4373	Lexer does not handle unicode numeric subscripts	liamoc	simonmar	"Hi all,

I would fix this myself but the GHC Lexer looks rather fragile and I'd be afraid of breaking something. I can have a crack at it and write a patch if you like.

Currently GHC rejects perfectly good unicode identifier characters (numeric subscripts):


For example, the following expression:


{{{
let v₂ = (+) in v₂ 1 3
}}}


gives:


{{{
lexical error at character '\8322'
}}}


The subscripts are in the ""!OtherNumber"" general unicode category, so 
I'm pretty sure the main change is to Lexer.x, changing:


{{{
   OtherNumber           -> other_graphic 
}}}

To some other category (in the definition of alexGetChar).

The main issue I see here is that we can't just change ""other_graphic"" to ""digit"" - it would have to be like ' or _ rather than digit or it would become acceptable to use these for real numeric digits, which I don't think we want.

Seeing as I am not confident enough in GHC's lexer/parser structure to make these changes, I was wondering if anyone who is more experienced who has the time could do it."	feature request	closed	normal	7.4.1	Compiler (Parser)		fixed	lexer, unicode, tiny		Unknown/Multiple	Unknown/Multiple	None/Unknown					
