Ticket #5599 (closed bug: fixed)

Opened 20 months ago

Last modified 19 months ago

msys has bad Unicode support

Reported by: simonpj Owned by: igloo
Priority: normal Milestone: 7.4.1
Component: Compiler Version: 7.2.1
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Difficulty:
Test Case: Blocked By:
Blocking: Related Tickets:

Description

Tests 3307 environment001 pass on Cygwin, Linux, fail on msys:

>    lib/IO                        3307 [bad exit code] (normal)
>    lib/IO                        environment001 [bad stdout] (normal)

Here is Max's diagnosis:

Basically, msys has kind of bad Unicode support. If you write a program "len.c" like this:

#include <windows.h>
#include <stdio.h>
#include <string.h>

int main(int _argc, char **_argv) {
	LPWSTR cmdLine = GetCommandLineW();

	int argc;
	LPWSTR *argv = CommandLineToArgvW(cmdLine, &argc);

	printf("%d args, %d wide chars in first arg\n", argc, wcslen(argv[1]));
	return 0;
}

Create a UTF-8 encoded file called "utf8" containing two characters:

不好

And then execute it like so:

gcc len.c && ./a.exe $(cat utf8)

(NB: it is irrelevant whether you use Cygwin gcc or msys gcc: this is an issue with the shells)

You get different results on msys and Cygwin:

  • On Cygwin, you get 2 wide characters in the first argument; i.e. the UTF-16 encoded Chinese text
  • On msys, you get 6 wide characters in the first argument; i.e. one 16-byte value for every byte in the UTF-8 encoded Chinese text

IMHO the msys behaviour is broken because the command line arguments supplied via the Windows API are meant to be UTF-16. It does match the behaviour of Windows cmd if you do this:

set /p myvar= < utf8
a.exe %myvar%

(You get "6 wide characters" printed)

Perhaps the issue in cmd stems from the fact that the Windows console is stuck in code page 850 and doesn't support the UTF-8 "code page". But msys really has no excuse since it reports itself as being UTF-8.

I'm not sure what to do here because I don't think our code actually has a problem, and the test does pass (and check something useful) in Linux, OS X and Cygwin. But still, something is not working quite right here. Perhaps just mark it as expect-fail in msys?

Change History

Changed 19 months ago by igloo

  • owner set to igloo
  • milestone set to 7.4.1

Changed 19 months ago by igloo

  • status changed from new to closed
  • resolution set to fixed

Fixed by:

commit f6f20381c8064f7f98f9b9ab082e5ad65c132be9
Author: Ian Lynagh <igloo@earth.li>
Date:   Sun Nov 27 16:36:55 2011 +0000

    Expect 3307 and environment001 to fail on msys; fixes trac #5599

    Unicode support on MSYS seems to be broken.
Note: See TracTickets for help on using tickets.