Safe Haskell | None |
---|---|
Language | Haskell2010 |
Data.IDX.Conduit
Description
Streaming (de)serialization and encode-decode functions for the IDX format used in the MNIST handwritten digit recognition dataset [1].
Both sparse and dense decoders are provided. In either case, the range of the data is the same as the raw data (one unsigned byte per pixel).
Links
Synopsis
- sourceIdxLabels :: MonadResource m => (ByteString -> Either e o) -> FilePath -> Maybe Int -> ConduitT () (Either e o) m r
- mnistLabels :: ByteString -> Either String Int
- sourceIdx :: MonadResource m => FilePath -> Maybe Int -> ConduitT () (Vector Word8) m ()
- sourceIdxSparse :: MonadResource m => FilePath -> Maybe Int -> ConduitT () (Sparse Word8) m ()
- sinkIdx :: (MonadResource m, Foldable t) => FilePath -> Int -> t Word32 -> ConduitT (Vector Word8) Void m ()
- sinkIdxSparse :: (Foldable t, MonadResource m) => FilePath -> Int -> t Word32 -> ConduitT (Sparse Word8) Void m ()
- data Sparse a
- sBufSize :: Sparse a -> Int
- sNzComponents :: Sparse a -> Vector (Int, a)
- readHeader :: FilePath -> IO (IDXMagic, Int32, Vector Int32)
Source
Labels
Arguments
:: MonadResource m | |
=> (ByteString -> Either e o) | parser for the labels, where the bytestring buffer contains exactly one unsigned byte |
-> FilePath | filepath of uncompressed IDX labels file |
-> Maybe Int | optional maximum number of entries to retrieve |
-> ConduitT () (Either e o) m r |
Outputs the labels corresponding to the data
mnistLabels :: ByteString -> Either String Int Source #
Parser for the labels, can be plugged in as an argument to sourceIdxLabels
Data
Dense
Arguments
:: MonadResource m | |
=> FilePath | filepath of uncompressed IDX data file |
-> Maybe Int | optional maximum number of entries to retrieve |
-> ConduitT () (Vector Word8) m () |
Outputs dense data buffers in the 0-255 range
In the case of MNIST dataset, 0 corresponds to the background of the image.
Sparse
Arguments
:: MonadResource m | |
=> FilePath | filepath of uncompressed IDX data file |
-> Maybe Int | optional maximum number of entries to retrieve |
-> ConduitT () (Sparse Word8) m () |
Outputs sparse data buffers (i.e without zero components)
This incurs at least one additional data copy of each vector, but the resulting vectors take up less space.
Sink
Data
Dense
Arguments
:: (MonadResource m, Foldable t) | |
=> FilePath | file to write |
-> Int | number of data items that will be written |
-> t Word32 | data dimension sizes |
-> ConduitT (Vector Word8) Void m () |
Warning: this produces an incomplete header for some reason, causing the decoder to chop the data items at the wrong length. Do not use until https://github.com/ocramz/mnist-idx-conduit/issues/1 is resolved.
Write a dataset to disk
Contents are written as unsigned bytes, so make sure 8 bit data comes in without losses
Sparse
Arguments
:: (Foldable t, MonadResource m) | |
=> FilePath | file to write |
-> Int | number of data items that will be written |
-> t Word32 | data dimension sizes |
-> ConduitT (Sparse Word8) Void m () |
Warning: this produces an incomplete header for some reason, causing the decoder to chop the data items at the wrong length. Do not use until https://github.com/ocramz/mnist-idx-conduit/issues/1 is resolved.
Write a sparse dataset to disk
Contents are written as unsigned bytes, so make sure 8 bit data comes in without losses
Types
Sparse buffer (containing only nonzero entries)
sBufSize :: Sparse a -> Int Source #
total number of entries in the _dense_ buffer, i.e. including the zeros
sNzComponents :: Sparse a -> Vector (Int, a) Source #
nonzero components, together with the linear index into their dense counterpart