hmatrix-mmap-0.0.5: Memory map Vector from disk into memory efficiently




Functions to represent a Vector on disk in efficient, if unportable, ways.

This module uses memory-mapping, a feature of all modern operating-systems, to mirror the disk contents in memory. There are quite a few advantages to memory-mapping files instead of reading the files traditionally:

  • Speed: memory-mapping is often much faster than traditional reading.
  • Memory efficiency: Memory-mapped files are loaded into RAM on-demand, and easily swapped out. The upside is that the program can work with data-sets larger than the available RAM, as long as they are accessed carefully.

The caveat to using memory-mapping is that it makes the files specific to the current architecture because of the endianness of the data. For more information, see the description in System.IO.MMap

If you wish to write the contents in a portable fashion, either use the ASCII load and save functions in Numeric.Container, or use the binary serialization in Data.Binary.


Memory-mapping Vector from disk



:: forall a . Storable a 
=> FilePath

Path of the file to map

-> Maybe (Int64, Int)

Nothing to map entire file into memory, otherwise 'Just (fileOffset, elementCount)'

-> IO (Vector a) 

Map a file into memory (read-only) as a Vector.

It is considered unsafe because changes to the underlying file may (or may not) be reflected in the Vector, which breaks referential transparency.



:: forall a . Storable a 
=> FilePath

Path of the file to map

-> Maybe (Int64, Int64)

Nothing to map entire file into memory, otherwise Just (fileOffset, totalElementCount)

-> Int

The number of elements in each Vector

-> IO (Int64, [Vector a])

Return (numberOfVectors,vectors)

Map a file into memory as a lazy-list of equal-sized Vector, even if they can't all fit in the address space at the same time.

 (numVectors,vectors) <- unsafeLazyMMapVectors filename Nothing vectorSize

Commonly, a data file will contain multiple vectors of equal length (matrix). This function is convenient for those uses, but it plays a more important role: supporting data-sets that cannot fit in the address space of the current machine.

On 32-bit machines the address space is only 4GB, and it is actually pretty easy to find data-sets that are too large to be represented, even in virtual memory.

This function loads the data in chunks, and as long as you drop your reference to the vectors as you consume the data, the old chunks will be unmapped before mapping the next chunk.

The number of vectors in the list is returned because it's often needed, yet calculating it using length would demand the whole list.

Writing Vector to disk

These functions write the Vector in a way suitable for reading back with unsafeMMapVector.

hPutVector :: forall a. Storable a => Handle -> Vector a -> IO ()Source

Write out a vector verbatim into an open file handle.

writeVector :: forall a. Storable a => FilePath -> Vector a -> IO ()Source

Write the vector verbatim to a file.