charsetdetect-1.0: Character set detection using Mozilla's Universal Character Set Detector

Codec.Text.Detect

Description

Detect the likely character encoding for a stream of bytes using Mozilla's Universal Character Set Detector.

Synopsis

Documentation

detectEncodingName :: ByteString -> Maybe StringSource

Detect the likely encoding used by a ByteString. At the time of writing, the encoding returned will be drawn from this list:

 Big5
 EUC-JP
 EUC-KR
 GB18030
 gb18030
 HZ-GB-2312
 IBM855
 IBM866
 ISO-2022-CN
 ISO-2022-JP
 ISO-2022-KR
 ISO-8859-2
 ISO-8859-5
 ISO-8859-7
 ISO-8859-8
 KOI8-R
 Shift_JIS
 TIS-620
 UTF-8
 UTF-16BE
 UTF-16LE
 UTF-32BE
 UTF-32LE
 windows-1250
 windows-1251
 windows-1252
 windows-1253
 windows-1255
 x-euc-tw
 X-ISO-10646-UCS-4-2143
 X-ISO-10646-UCS-4-3412
 x-mac-cyrillic

Note that there are two capitalisations of gb18030. For this reason (and to be future-proof against any future behaviour like this for newly-added character sets) we recommend that you compare character set names case insensitively.

detectEncoding :: ByteString -> IO (Maybe TextEncoding)Source

Detect the encoding for a ByteString and attempt to create a TextEncoding suitable for decoding it.