Ticket #5599 (closed bug: fixed)
msys has bad Unicode support
| Reported by: | simonpj | Owned by: | igloo |
|---|---|---|---|
| Priority: | normal | Milestone: | 7.4.1 |
| Component: | Compiler | Version: | 7.2.1 |
| Keywords: | Cc: | ||
| Operating System: | Unknown/Multiple | Architecture: | Unknown/Multiple |
| Type of failure: | None/Unknown | Difficulty: | |
| Test Case: | Blocked By: | ||
| Blocking: | Related Tickets: |
Description
Tests 3307 environment001 pass on Cygwin, Linux, fail on msys:
> lib/IO 3307 [bad exit code] (normal) > lib/IO environment001 [bad stdout] (normal)
Here is Max's diagnosis:
Basically, msys has kind of bad Unicode support. If you write a program "len.c" like this:
#include <windows.h>
#include <stdio.h>
#include <string.h>
int main(int _argc, char **_argv) {
LPWSTR cmdLine = GetCommandLineW();
int argc;
LPWSTR *argv = CommandLineToArgvW(cmdLine, &argc);
printf("%d args, %d wide chars in first arg\n", argc, wcslen(argv[1]));
return 0;
}
Create a UTF-8 encoded file called "utf8" containing two characters:
不好
And then execute it like so:
gcc len.c && ./a.exe $(cat utf8)
(NB: it is irrelevant whether you use Cygwin gcc or msys gcc: this is an issue with the shells)
You get different results on msys and Cygwin:
- On Cygwin, you get 2 wide characters in the first argument; i.e. the UTF-16 encoded Chinese text
- On msys, you get 6 wide characters in the first argument; i.e. one 16-byte value for every byte in the UTF-8 encoded Chinese text
IMHO the msys behaviour is broken because the command line arguments supplied via the Windows API are meant to be UTF-16. It does match the behaviour of Windows cmd if you do this:
set /p myvar= < utf8 a.exe %myvar%
(You get "6 wide characters" printed)
Perhaps the issue in cmd stems from the fact that the Windows console is stuck in code page 850 and doesn't support the UTF-8 "code page". But msys really has no excuse since it reports itself as being UTF-8.
I'm not sure what to do here because I don't think our code actually has a problem, and the test does pass (and check something useful) in Linux, OS X and Cygwin. But still, something is not working quite right here. Perhaps just mark it as expect-fail in msys?
