Safe Haskell | None |
---|---|
Language | Haskell2010 |
Streaming (de)serialization and encode-decode functions for the IDX format used in the MNIST handwritten digit recognition dataset [1].
Both sparse and dense decoders are provided. In either case, the range of the data is the same as the raw data (one unsigned byte per pixel).
Links
Synopsis
- sourceIdxLabels :: MonadResource m => (ByteString -> Either e o) -> FilePath -> Maybe Int -> ConduitT () (Either e o) m r
- mnistLabels :: ByteString -> Either String Int
- sourceIdx :: MonadResource m => FilePath -> Maybe Int -> ConduitT () (Vector Word8) m ()
- sourceIdxSparse :: MonadResource m => FilePath -> Maybe Int -> ConduitT () (Sparse Word8) m ()
- sinkIdx :: (MonadResource m, Foldable t) => FilePath -> Int -> t Word32 -> ConduitT (Vector Word8) Void m ()
- sinkIdxSparse :: (Foldable t, MonadResource m) => FilePath -> Int -> t Word32 -> ConduitT (Sparse Word8) Void m ()
- data Sparse a
- sBufSize :: Sparse a -> Int
- sNzComponents :: Sparse a -> Vector (Int, a)
- readHeader :: FilePath -> IO (IDXMagic, Int32, Vector Int32)
Source
Labels
:: MonadResource m | |
=> (ByteString -> Either e o) | parser for the labels, where the bytestring buffer contains exactly one unsigned byte |
-> FilePath | filepath of uncompressed IDX labels file |
-> Maybe Int | optional maximum number of entries to retrieve |
-> ConduitT () (Either e o) m r |
Outputs the labels corresponding to the data
mnistLabels :: ByteString -> Either String Int Source #
Parser for the labels, can be plugged in as an argument to sourceIdxLabels
Data
Dense
:: MonadResource m | |
=> FilePath | filepath of uncompressed IDX data file |
-> Maybe Int | optional maximum number of entries to retrieve |
-> ConduitT () (Vector Word8) m () |
Outputs dense data buffers in the 0-255 range
In the case of MNIST dataset, 0 corresponds to the background of the image.
Sparse
:: MonadResource m | |
=> FilePath | filepath of uncompressed IDX data file |
-> Maybe Int | optional maximum number of entries to retrieve |
-> ConduitT () (Sparse Word8) m () |
Outputs sparse data buffers (i.e without zero components)
This incurs at least one additional data copy of each vector, but the resulting vectors take up less space.
Sink
Data
Dense
:: (MonadResource m, Foldable t) | |
=> FilePath | file to write |
-> Int | number of data items that will be written |
-> t Word32 | data dimension sizes |
-> ConduitT (Vector Word8) Void m () |
Warning: this produces an incomplete header for some reason, causing the decoder to chop the data items at the wrong length. Do not use until https://github.com/ocramz/mnist-idx-conduit/issues/1 is resolved.
Write a dataset to disk
Contents are written as unsigned bytes, so make sure 8 bit data comes in without losses
Sparse
:: (Foldable t, MonadResource m) | |
=> FilePath | file to write |
-> Int | number of data items that will be written |
-> t Word32 | data dimension sizes |
-> ConduitT (Sparse Word8) Void m () |
Warning: this produces an incomplete header for some reason, causing the decoder to chop the data items at the wrong length. Do not use until https://github.com/ocramz/mnist-idx-conduit/issues/1 is resolved.
Write a sparse dataset to disk
Contents are written as unsigned bytes, so make sure 8 bit data comes in without losses
Types
Sparse buffer (containing only nonzero entries)
sBufSize :: Sparse a -> Int Source #
total number of entries in the _dense_ buffer, i.e. including the zeros
sNzComponents :: Sparse a -> Vector (Int, a) Source #
nonzero components, together with the linear index into their dense counterpart