franz: Append-only database

[ bsd3, database, library, program ] [ Propose Tags ]

Please see the README on GitHub at https://github.com/fumieval/franz#readme


[Skip to Readme]
Versions [faq] 0.2.1, 0.3, 0.3.0.1
Change log ChangeLog.md
Dependencies base (>=4.7 && <5), bytestring, cereal, concurrent-resource-map (==0.2.*), containers, cpu, deepseq, directory, fast-builder (>=0.1.2.0 && <0.2), filepath, franz, fsnotify, mtl, network, optparse-applicative, process, retry, sendfile, stm, stm-delay, transformers, unboxed-ref, unordered-containers, vector [details]
License BSD-3-Clause
Copyright Copyright (c) 2019 Fumiaki Kinoshita
Author Fumiaki Kinoshita
Maintainer fumiexcel@gmail.com
Category Database
Home page https://github.com/fumieval/franz#readme
Bug tracker https://github.com/fumieval/franz/issues
Source repo head: git clone https://github.com/fumieval/franz
Uploaded by FumiakiKinoshita at 2020-06-04T00:26:04Z
Distributions NixOS:0.3.0.1
Executables franzd, franz
Downloads 393 total (5 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Hackage Matrix CI
Docs available [build log]
Last success reported on 2020-06-04 [all 1 reports]

Modules

[Index] [Quick Jump]

Downloads

Maintainer's Corner

For package maintainers and hackage trustees


Readme for franz-0.3.0.1

[back to package description]

Franz

Haskell CI

Franz is an append-only container format, forked from liszt.

Each stream is stored as a pair of concatenated payloads with an array of their byte offsets.

Design requirements

  • The writer must be integrated so that no server failure blocks the application.
  • There's a way to archive streams into one file.
  • There's a way to fetch data in a period of time efficiently.
    • In particular, the server should be able to search by timestamps, rather than performing binary search by the client.
  • The server must not take too long to restart.

Usecase

  • Instances of franzd are running on a remote server and a local gateway.
  • The application produces franz files locally using the writer API.
  • On the local gateway, a proxy connects to the remote server and downsamples the file.
  • Clients can connect to the gateway. When needed, they may also connect directly to the remote server.

Format details

The on-disk representation of a franz stream comprises the following files:

  • payloads: concatenated payloads
  • offsets: A sequence of N-tuples of 64-bit little endian integers representing
    • 0th: byte offsets of payloads
    • nth, n ∈ [1..N]: the value of nth index, where N is the number of index names
  • indices: Line-separated list of index names. An index represents a 64 bit little-endian integer attached to a payload.

A stream is stored as a directory containing the files above.

The Franz reader also supports a squashfs image, provided that the content is a valid franz stream.

franzd

franzd is a read-only server which follows franz files and gives access on wire. Where to look for streams can be specified as a command-line argument, separately for live streams and squashfs images.

Each stream is stored as a pair of concatenated payloads with an array of their byte offsets.

Why not Kafka

  • None of us want to debug/contribute to kafka.
  • Trying to read from a stream creates the stream (this is a problem due to the way we name our streams and rely on latest)
  • Can't delete a stream as long as there is a reader existing
  • Lack of understanding of it (but there is a lot of good documentation out there. recommended)
  • Kafka takes a long time to start up after an abnormal shutdown on the server side
  • Supports clustering but sometimes makes the reliability of the whole system worse

Design requirements

  • The writer must be integrated so that no external process make logging fail.
  • There's a way to archive streams into one file.
  • There's a way to read streams without a server.
  • There's a way to fetch data in a period of time efficiently.
    • In particular, the server should be able to search by timestamps or gen nums, rather than performing binary search by the client.
  • The server must not take too long to restart.

Usecase

  • Instances of franzd are running on a remote server and a local gateway.
  • The application produces franz files locally using the writer API.
  • On the local gateway, a proxy connects to the remote server and downsamples the file.
  • Clients can connect to the gateway. When needed, they may also connect directly to the remote server.

Format details

The on-disk representation of a franz stream comprises the following files:

  • payloads: concatenated payloads
  • offsets: A sequence of N-tuples of 64-bit little endian integers representing
    • 0th: byte offsets of payloads
    • nth, n ∈ [1..N]: the value of nth index, where N is the number of index names
  • indices: Line-separated list of index names. An index represents a 64 bit little-endian integer attached to a payload.

A stream is stored as a directory containing the files above.

The Franz reader also supports a squashfs image, provided that the content is a valid franz stream.

franzd

franzd is a read-only server which follows franz files and gives access on wire. Where to look for streams can be specified as a command-line argument, separately for live streams and squashfs images.

franzd --live /path/to/live --archive /path/to/archive

You can obtain a Connection to a remote franz file with withConnection. It tries to mount a squashfs image at path. This is shared between connections, and unmounts when the last client closes the connection.

withConnection :: (MonadIO m, MonadMask m)
  => String -- host
  -> Int -- port
  -> ByteString -- path
  -> (Connection -> m r) -> m r

fetch returns a list of triples of offsets, tags, and payloads.

data RequestType = AllItems | LastItem deriving (Show, Generic)

data ItemRef = BySeqNum !Int -- ^ sequential number
  | ByIndex !B.ByteString Int -- ^ index name and value

data Query = Query
  { reqStream :: !B.ByteString
  , reqFrom :: !ItemRef -- ^ name of the index to search
  , reqTo :: !ItemRef -- ^ name of the index to search
  , reqType :: !RequestType
  } deriving (Show, Generic)

type SomeIndexMap = HM.HashMap B.ByteString Int64

type Contents = [(Int, SomeIndexMap, B.ByteString)]

-- | When it is 'Right', it blocks until the content is available on the server.
type Response = Either Contents (STM Contents)

fetch :: Connection
  -> Query
  -> (STM Response -> IO r)
  -- ^ running the STM action blocks until the response arrives
  -> IO r

franz CLI: reading

Read 0th to 9th elements

franz test -r 0:9

Follow a stream

franz test -b _1
``