multiarg- Combinators to build command line parsers

Safe HaskellSafe-Infered



Get the arguments from the command line, ensuring they are properly encoded into Unicode.

base has a System.Environment.getArgs that does not return a Unicode string. Instead, it simply puts each octet into a different Char. Thus its getArgs is broken on UTF-8 and nearly any non-ASCII encoding. As a workaround I use System.Environment.UTF8. The downside of this is that it requires that the command line be encoded in UTF8, regardless of what the default system encoding is.

Unlike base, base actually returns a proper Unicode string when you call System.Environment.getArgs. (base comes with ghc 7.0.4; base comes with ghc 7.2.) The string is encoded depending on the default system locale. The only problem is that System.Environment.UTF8 apparently simply uses System.Environment.getArgs and then assumes that the string it returns has not been decoded. In other words, System.Environment.UTF8 assumes that System.Environment.getArgs is broken, and when System.Environment.getArgs was fixed in base, it likely will break System.Environment.UTF8.

One obvious solution to this problem is to find some other way to get the command line that will not break when base is updated. But it was not easy to find such a thing. The other libraries I saw on hackage (as of January 6, 2012) had problems, such as breakage on ghc 7.2. There is a package that has a simple interface to the UNIX setlocale(3) function, but I'm not sure that what it returns easily and reliably maps to character encodings that you can use with, say, iconv.

So by use of Cabal and preprocessor macors, the code uses utf8-string if base is less than 4.4, and uses System.Environment.getArgs if base is at least 4.4.

The GHC bug is here:



getArgs :: IO [String]Source

Gets the command-line arguments supplied by the program's user. If the base package is older than version 4.4, then this function assumes the command line is encoded in UTF-8, which is true for many newer Unix systems; however, many older systems may use single-byte encodings like ISO-8859. In such cases, this function will give erroneous results.

If the base package is version 4.4.0 or newer, this function simply uses the getArgs that comes with base. That getArgs detects the system's default encoding and uses that, so it should give accurate results on most systems.

getProgName :: IO StringSource

Gets the name of the program that the user invoked. See documentation for getArgs for important caveats that also apply to this function.