data-dispersal: Space-efficient and privacy-preserving data dispersal algorithms.

[ cryptography, data, lgpl, library ] [ Propose Tags ]

Given a ByteString of length D, we encode the ByteString as a list of n Fragments, each containing a ByteString of length O(D/m). Then, each fragment could be stored on a separate machine to obtain fault-tolerance: Even if all but m of these machines crash, we can still reconstruct the original ByteString out of the remaining m fragments. Note that the total space requirement of the m fragments is m * O(D/m)=O(D), which is clearly space-optimal. The total space required for the n fragments is O((n/m)*D). Note that m and n can be chosen to be of the same order, so the asymptotic storage overhead for getting good fault-tolerance increases only by a constant factor.

GHCi Example:

> :m + Data.IDA
> let msg = Data.ByteString.Char8.pack "my really important data"
> let fragments = encode 5 15 msg
-- Now we could distributed the fragments on different sites to add some
-- fault-tolerance.
> let frags' = drop 5 $ take 10 fragments -- let's pretend that 10 machines crashed
-- Let's look at the 5 fragments that we have left:
> mapM_ (Prelude.putStrLn . show)  frags'
(6,[273,771,899,737,285])
(7,[289,939,612,285,936])
(8,[424,781,1001,322,788])
(9,[143,657,790,157,423])
(10,[314,674,418,888,423])
-- Space-efficiency: Note that the length of each of the 5 fragments is 5
-- and our original message has length 24.
> decode frags'
"my really important data"

Encrypted Fragments:

The module Data.IDA contains an information dispersal algorithm that produces space-optimal fragments. However, the knowledge of 1 or more fragments might allow an adversary to deduce some information about the original data. The module Crypto.IDA combines information dispersal with secret sharing: the knowledge of up to m-1 fragments does not leak any information about the original data.

This could be useful in scenarios where we need to store data at untrusted storage sites: To this end, we store one encrypted fragment at each site. If at most m-1 of these untrusted sites collude, they will still be unable to obtain any information about the original data. The added security comes at the price of a slightly increased fragment size (by an additional constant 32 bytes) and an additional overhead in the running time of the encoding/decoding process. The algorithm is fully described in module Crypto.IDA.

Fault-Tolerance:

Suppose that we have N machines and encode our data as 2log(N) fragments with reconstruction threshold m = log(N). Let's assume that we store each fragment on a separate machine and each machine fails (independently) with probability at most 0.5.

What is the probability of our data being safe? Pr[ at most n-m machines crash ] >= 1-0.5^(log(N)) = 1-N^(-1).
What is the overhead in terms of space that we pay for this level of fault-tolerance? We have n fragments, each of size O(D/m), so the total space is O(n D/ m) = 2D. In other words, we can guarantee that the data survives with high probability by increasing the required space by a constant factor.

This library is based on the following works:

"Efficient Dispersal of Information for Security, Load Balancing, and Fault Tolerance", by Michael O. Rabin, JACM 1989.
"How to share a secret." by Adi Shamir. In Communications of the ACM 22 (11): 612–613, 1979.
"Secret Sharing Made Short" Hugo Krawczyk. CRYPTO 1993: 136-146

Modules

[Last Documentation]

Crypto
- Crypto.IDA
Data
- Data.IDA
  - Data.IDA.FiniteField
  - Data.IDA.Internal

Downloads

data-dispersal-1.0.0.1.tar.gz [browse] (Cabal source package)
Package description (as included in the package)

Maintainer's Corner

Package maintainers

PeterRobinson

For package maintainers and hackage trustees

edit package information

Candidates

1.0.0.0, 1.0.0.1, 1.0.0.2

Versions [RSS]	1.0.0.0, 1.0.0.1, 1.0.0.2
Dependencies	AES (>=0.2.9), array (>=0.4.0.1), base (>=4.6 && <4.7), binary (>=0.7.2.1), bytestring (>=0.10.0.2), entropy (>=0.3.2), finite-field (>=0.8.0), matrix (>=0.3.4.0), secret-sharing (>=1.0.0.0), syb (>=0.4.0), vector (>=0.10.11.0) [details]
License	LGPL-2.1-only
Copyright	Peter Robinson 2014
Author	Peter Robinson <peter.robinson@monoid.at>
Maintainer	peter.robinson@monoid.at
Category	Data, Cryptography
Home page	http://monoid.at/code
Uploaded	by PeterRobinson at 2014-09-20T17:56:33Z
Distributions
Reverse Dependencies	1 direct, 0 indirect [details]
Downloads	2805 total (10 in the last 30 days)
Rating	(no votes yet) [estimated by Bayesian average]
Your Rating	λ λ λ
Status	Docs not available [build log] All reported builds failed as of 2016-12-13 [all 6 reports]