data-dispersal: Space-efficient and privacy-preserving data dispersal algorithms.

[ cryptography, data, lgpl, library ] [ Propose Tags ]

This library provides space-efficient (m,n)-information dispersal algorithms (IDAs).

Given a ByteString bstr of length D, we encode bstr as a list fs of n Fragments, each containing a ByteString of length O(D/m). Then, each fragment in fs could be stored on a separate machine for fault-tolerance. Even if up to n-m of these machines crash, we can still reconstruct the original ByteString out of the remaining m fragments. The total space required for the n fragments is O((n/m)*D). Note that m and n are roughly in the same order, so the actual storage overhead for getting good fault-tolerance increases only by a constant factor.

The module Data.IDA contains the basic information dispersal algorithm. The module Crypto.IDA augments the dispersal scheme by combining it with secret sharing, i.e., the knowledge of up to m-1 fragments does not leak any information about the original data. See Crypto.IDA for details.

GHCi Example:

> :m + Data.IDA
> let msg = Data.ByteString.Char8.pack "my really important data"
> let fragments = encode 5 15 msg
-- Now we could distributed the fragments on different sites to add some
-- fault-tolerance.
> let frags' = drop 5 $ take 10 fragments -- let's pretend that 10 machines crashed
> decode frags'
"my really important data"

Fault-Tolerance:

Suppose that we have N machines and encode our data as 2log(N) fragments with reconstruction threshold m = log(N). Let's assume that we store each fragment on a separate machine and each machine fails (independently) with probability at most 0.5.

What is the probability of our data being safe? Pr[ at most n-m machines crash ] >= 1-0.5^(log(N)) = 1-N^(-1).
What is the overhead in terms of space that we pay for this level of fault-tolerance? We have n fragments, each of size D/m, so the total space is n * D/ m = 2D. In other words, we can guarantee that the data survives with high probability by increasing the required space by a constant factor.

This library is based on the following works:

"Efficient Dispersal of Information for Security, Load Balancing, and Fault Tolerance", by Michael O. Rabin, JACM 1989.
"How to share a secret." by Adi Shamir. In Communications of the ACM 22 (11): 612–613, 1979.
"Secret Sharing Made Short" Hugo Krawczyk. CRYPTO 1993: 136-146

Modules

[Last Documentation]

Crypto
- Crypto.IDA
Data
- Data.IDA
  - Data.IDA.FiniteField
  - Data.IDA.Internal

Downloads

data-dispersal-1.0.0.0.tar.gz [browse] (Cabal source package)
Package description (revised from the package)

Note: This package has metadata revisions in the cabal description newer than included in the tarball. To unpack the package including the revisions, use 'cabal get'.

Maintainer's Corner

Package maintainers

PeterRobinson

For package maintainers and hackage trustees

edit package information

Candidates

1.0.0.0, 1.0.0.1, 1.0.0.2

Versions [RSS]	1.0.0.0, 1.0.0.1, 1.0.0.2
Dependencies	AES (>=0.2.9), array (>=0.4.0.1), base (<0), binary (>=0.7.2.1), bytestring (>=0.10.0.2), entropy (>=0.3.2), finite-field (>=0.8.0), matrix (>=0.3.4.0), secret-sharing (>=1.0.0.0), syb (>=0.4.0), vector (>=0.10.11.0) [details]
License	LGPL-2.1-only
Copyright	Peter Robinson 2014
Author	Peter Robinson <peter.robinson@monoid.at>
Maintainer	peter.robinson@monoid.at
Revised	Revision 1 made by HerbertValerioRiedel at 2019-05-06T13:21:52Z
Category	Data, Cryptography
Home page	http://monoid.at/code
Uploaded	by PeterRobinson at 2014-09-08T16:32:19Z
Distributions
Reverse Dependencies	1 direct, 0 indirect [details]
Downloads	2806 total (11 in the last 30 days)
Rating	(no votes yet) [estimated by Bayesian average]
Your Rating	λ λ λ
Status	Docs not available [build log] All reported builds failed as of 2016-12-13 [all 6 reports]