liboleg-2010.1.5: An evolving collection of Oleg Kiselyov's Haskell modules



A general-purpose TIFF library

The library gives the user the TIFF dictionary, which the user can search for specific tags and obtain the values associated with the tags, including the pixel matrix.

The overarching theme is incremental processing: initially, only the TIFF dictionary is read. The value associated with a tag is read only when that tag is looked up (unless the value was short and was packed in the TIFF dictionary entry). The pixel matrix (let alone the whole TIFF file) is not loaded in memory -- the pixel matrix is not even located before it is needed. The matrix is processed incrementally, by a user-supplied iteratee.

The incremental processing is accomplished by iteratees and enumerators. The enumerators are indeed first-class, they are stored in the interned TIFF dictionary data structure. These enumerators represent the values associated with tags; the values will be read on demand, when the enumerator is applied to a user-given iteratee.

The library extensively uses nested streams, tacitly converting the stream of raw bytes from the file into streams of integers, rationals and other user-friendly items. The pixel matrix is presented as a contiguous stream, regardless of its segmentation into strips and physical arrangement. The library exhibits random IO and binary parsing, reading of multi-byte numeric data in big- or little-endian formats. The library can be easily adopted for AIFF, RIFF and other IFF formats.

We show a representative application of the library: reading a sample TIFF file, printing selected values from the TIFF dictionary, verifying the values of selected pixels and computing the histogram of pixel values. The pixel verification procedure stops reading the pixel matrix as soon as all specified pixel values are verified. The histogram accumulation does read the entire matrix, but incrementally. Neither pixel matrix processing procedure loads the whole matrix in memory. In fact, we never read and retain more than the IO-buffer-full of raw data.



compute_hist :: TIFFDict -> IterateeGM Word8 RBIO (Int, IntMap Int)Source

Sample TIFF user code The following is sample code using the TIFF library (whose implementation is in the second part of this file). Our sample code prints interesting information from the TIFF dictionary (such as the dimensions, the resolution and the name of the image)

The sample file is a GNU logo (from converted from JPG to TIFF. Copyleft by GNU.

The main user function. tiff_reader is the library function, which builds the TIFF dictionary. process_tiff is the user function, to extract useful data from the dictionary

Sample TIFF processing function

sample processing of the pixel matrix: computing the histogram

type EnumeratorGMM elfrom elto m a = IterateeG elto m a -> IterateeGM elfrom m aSource

Another sample processor of the pixel matrix: verifying values of some pixels This processor does not read the whole matrix; it stops as soon as everything is verified or the error is detected

TIFF library code

We need a more general enumerator type: enumerator that maps streams (not necessarily in lock-step). This is a flattened (`joinI-ed') EnumeratorN elfrom elto m a

type TIFFDict = IntMap TIFFDESource

A TIFF directory is a finite map associating a TIFF tag with a record TIFFDE

data TIFFDE Source



tiff_reader :: IterateeGM Word8 RBIO (Maybe TIFFDict)Source

The library function to read the TIFF dictionary

u32_to_float :: Word32 -> DoubleSource

A few conversion procedures

load_dict :: IterateeGM Word8 RBIO (Maybe TIFFDict)Source

An internal function to load the dictionary. It assumes that the stream is positioned to read the dictionary

pixel_matrix_enum :: TIFFDict -> EnumeratorN Word8 Word8 RBIO aSource

Reading the pixel matrix For simplicity, we assume no compression and 8-bit pixels

dict_read_int :: TIFF_TAG -> TIFFDict -> IterateeGM Word8 RBIO (Maybe Integer)Source

A few helpers for getting data from TIFF dictionary