úΊï†×     portable (H98 + FFI)duncan@haskell.orgNone Current input buffer Current read offset Input bytes left Total read offset Current output buffer Base out offset Available output bytes Free output space EThis never needs to be used as the iconv descriptor will be released J automatically when no longer needed, however this can be used to release G it early. Only use this when you can guarantee that the iconv will no I longer be needed, for example if an error occurs or if the input stream  ends. 4 !"#$%&'()*+,-./0123456789:;<=>?@ABCD !"#$%&./0123456;<=ABC ! "%$#&'() *+,-./0123456789:;<=>?@ABCDportable (H98 + FFI)duncan@haskell.orgNone EAn unexpected iconv error. The iconv spec lists a number of possible F expected errors but does not guarantee that there might not be other  errors. GThis error can occur either immediately, which might indicate that the H iconv installation is messed up somehow, or it could occur later which H might indicate resource exhaustion or some other internal iconv error. Use " to get slightly more information & on what the error could possibly be. CThis error covers the case where the end of the input has trailing D bytes that are the initial bytes of a valid character in the input J encoding. In other words, it looks like the input ended in the middle of J a multi-byte character. This would often be an indication that the input G was somehow truncated. Again, the Int parameter is the byte offset in 2 the input where the incomplete character starts. ,This covers two possible conversion errors: F There is a byte sequence in the input that is not valid in the input  encoding. C There is a valid character in the input that has no corresponding # character in the output encoding. DUnfortunately iconv does not let us distinguish these two cases. In F either case, the Int parameter gives the byte offset in the input of 4 the unrecognised bytes or unconvertable character. ?The conversion from the input to output string encoding is not C supported by the underlying iconv implementation. This is usually > because a named encoding is not recognised or support for it ! was not enabled on this system. EThe POSIX standard does not guarantee that all possible combinations B of recognised string encoding are supported, however most common 7 implementations do support all possible combinations. BOutput spans from encoding conversion. When nothing goes wrong we  expect just a bunch of /s. If there are conversion errors we get other  span types. BAn error in the conversion process. If this occurs it will be the  last span. /An ordinary output span in the target encoding A string encoding name, eg "UTF-8" or "LATIN1". JThe range of string encodings available is determined by the capabilities ) of the underlying iconv implementation. LWhen using the GNU C or libiconv libraries, the permitted values are listed  by the  iconv --list4 command, and all combinations of the listed values  are supported. 8Convert text from one named string encoding to another.  The conversion is done lazily. G An exception is thrown if conversion between the two encodings is not  supported. E An exception is thrown if there are any encoding conversion errors. 4Convert text ignoring encoding conversion problems. FIf invalid byte sequences are found in the input they are ignored and J conversion continues if possible. This is not always possible especially H with stateful encodings. No placeholder character is inserted into the H output so there will be no indication that invalid byte sequences were  encountered. GIf there are characters in the input that have no direct corresponding J character in the output encoding then they are dealt in one of two ways,  depending on the  argument. We can try and  them into G the nearest corresponding character(s) or use a replacement character  (typically '?'; or the Unicode replacement character). Alternatively they  can simply be ed. GIn either case, no exceptions will occur. In the case of unrecoverable H errors, the output will simply be truncated. This includes the case of G unrecognised or unsupported encoding names; the output will be empty. K This function only works with the GNU iconv implementation which provides B this feature beyond what is required by the iconv specification. HThis variant does the conversion all in one go, so it is able to report N any conversion errors up front. It exposes all the possible error conditions  and never throws exceptions JThe disadvantage is that no output can be produced before the whole input ? is consumed. This might be problematic for very large inputs. EThis version provides a more complete but less convenient conversion J interface. It exposes all the possible error conditions and never throws  exceptions. KThe conversion is still lazy. It returns a list of spans, where a span may I be an ordinary span of output text or a conversion error. This somewhat M complex interface allows both for lazy conversion and for precise reporting - of conversion problems. The other functions   and  0 are actually simple wrappers on this function. E!The posix iconv api looks like it'&s designed specifically for streaming A and it is, except for one really really annoying corner case...  Suppose you'Are converting a stream, say by reading a file in 4k chunks. This N would seem to be the canonical use case for iconv, reading and converting an M input file. However suppose the 4k read chunk happens to split a multi-byte L character. Then iconv will stop just before that char and tell us that its 0 an incomplete char. So far so good. Now what we'd like to do is have iconv J remember those last few bytes in its conversion state so we can carry on L with the next 4k block. Sadly it does not. It requires us to fix things up M so that it can carry on with the next block starting with a complete multi- M byte character. Do do that we have to somehow copy those few trailing bytes ) to the beginning of the next block. That's perhaps not too bad in an 5 imperitive context using a mutable input buffer - we'd just copy the few L trailing bytes to the beginning of the buffer and do a short read (ie 4k-n $ the number of trailing bytes). That''s not terribly nice since it means the I OS has to do IO on non-page aligned buffers which tends to be slower. It's  worse for us though since we''re not using a mutable input buffer, we're C using a lazy bytestring which is a sequence of immutable buffers. ISo we have to do more cunning things. We could just prepend the trailing N bytes to the next block, but that would mean alocating and copying the whole H next block just to prepend a couple bytes. This probably happens quite I frequently so would be pretty slow. So we have to be even more cunning. LThe solution is to create a very small buffer to cover the few bytes making M up the character spanning the block boundary. So we copy the trailing bytes M plus a few from the beginning of the next block. Then we run iconv again on M that small buffer. How many bytes from the next block to copy is a slightly ' tricky issue. If we copy too few there'&s no guarantee that we have enough to = give a complete character. We opt for a maximum size of 16, F N on the theory that no encoding in existance uses that many bytes to encode a 5 single character, so it ought to be enough. Yeah, it's a tad dodgey. LHaving papered over the block boundary, we still have to cross the boundary ' of this small buffer. It looks like we've still got the same problem, J however this time we should have crossed over into bytes that are wholly K part of the large following block so we can abandon our small temp buffer N an continue with the following block, with a slight offset for the few bytes 7 taken up by the chars that fit into the small buffer. GSo yeah, pretty complex. Check out the proof below of the tricky case. GHI Name of input string encoding Name of output string encoding  Input text  Output text $Whether to try and transliterate or . discard characters with no direct conversion Name of input string encoding Name of output string encoding  Input text  Output text Name of input string encoding Name of output string encoding  Input text  Output text or conversion error Name of input string encoding Name of output string encoding  Input text Output text spans JKLEMNOF    GIH JKLEMNOFP          !"#$%&'(()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMN iconv-0.4.1.1Codec.Text.IConvCodec.Text.IConv.InternalForeign.C.ErrorerrnoToIOErrorFuzzyDiscard TransliterateConversionErrorUnexpectedErrorIncompleteChar InvalidCharUnsuportedConversionSpan EncodingNamereportConversionErrorconvert convertFuzzyconvertStrictly convertLazilyinBufferinOffsetinLengthinTotal outBuffer outOffset outLengthoutFreefinaliseConversionDescriptorStatus OutputFull InputEmpty InitStatusUnexpectedInitErrorUnsupportedConversionInitOkIConvIunIBuffers c_iconv_closec_iconv c_iconv_openpushInputBufferinputBufferEmptyinputBufferSize inputPositionreplaceInputBuffernewOutputBufferpopOutputBufferoutputBufferBytesAvailableoutputBufferFull nullBuffersreturnIbindIthenIrun unsafeLiftIOunsafeInterleavegetgetsmodifytracedumpiconv $fMonadIConv fixupBoundary tmpChunkSizeInvalidCharBehaviourIgnoreInvalidCharStopOnInvalidCharconvertInternalfillInputBuffer drainBuffers invalidCharfailConversion outChunkSize