Get the arguments from the command line, ensuring they are properly encoded into Unicode.
base 18.104.22.168 has a System.Environment.getArgs that does not return a Unicode string. Instead, it simply puts each octet into a different Char. Thus its getArgs is broken on UTF-8 and nearly any non-ASCII encoding. As a workaround I use System.Environment.UTF8. The downside of this is that it requires that the command line be encoded in UTF8, regardless of what the default system encoding is.
Unlike base 22.214.171.124, base 126.96.36.199 actually returns a proper Unicode string when you call System.Environment.getArgs. (base 188.8.131.52 comes with ghc 7.0.4; base 184.108.40.206 comes with ghc 7.2.) The string is encoded depending on the default system locale. The only problem is that System.Environment.UTF8 apparently simply uses System.Environment.getArgs and then assumes that the string it returns has not been decoded. In other words, System.Environment.UTF8 assumes that System.Environment.getArgs is broken, and when System.Environment.getArgs was fixed in base 220.127.116.11, it likely will break System.Environment.UTF8.
One obvious solution to this problem is to find some other way to get the command line that will not break when base is updated. But it was not easy to find such a thing. The other libraries I saw on hackage (as of January 6, 2012) had problems, such as breakage on ghc 7.2. There is a package that has a simple interface to the UNIX setlocale(3) function, but I'm not sure that what it returns easily and reliably maps to character encodings that you can use with, say, iconv.
So by use of Cabal and preprocessor macors, the code uses utf8-string if base is less than 4.4, and uses System.Environment.getArgs if base is at least 4.4.
The GHC bug is here:
Gets the command-line arguments supplied by the program's
user. If the
base package is older than version 4.4, then this
function assumes the command line is encoded in UTF-8, which is
true for many newer Unix systems; however, many older systems may
use single-byte encodings like ISO-8859. In such cases, this
function will give erroneous results.
base package is version 4.4.0 or newer, this function
simply uses the getArgs that comes with
base. That getArgs
detects the system's default encoding and uses that, so it should
give accurate results on most systems.