# data-dispersal: Space-efficient and privacy-preserving data dispersal algorithms.

[ cryptography, data, lgpl, library ] [ Propose Tags ]

This library provides space-efficient (m,n)-information dispersal algorithms (IDAs).

Given a ByteString bstr of length D, we encode bstr as a list fs of n Fragments, each containing a ByteString of length O(D/m). Then, each fragment in fs could be stored on a separate machine for fault-tolerance. Even if up to n-m of these machines crash, we can still reconstruct the original ByteString out of the remaining m fragments. The total space required for the n fragments is O((n/m)*D). Note that m and n are roughly in the same order, so the actual storage overhead for getting good fault-tolerance increases only by a constant factor.

The module Data.IDA contains the basic information dispersal algorithm. The module Crypto.IDA augments the dispersal scheme by combining it with secret sharing, i.e., the knowledge of up to m-1 fragments does not leak any information about the original data. See Crypto.IDA for details.

GHCi Example:

> :m + Data.IDA
> let msg = Data.ByteString.Char8.pack "my really important data"
> let fragments = encode 5 15 msg
-- Now we could distributed the fragments on different sites to add some
-- fault-tolerance.
> let frags' = drop 5 \$ take 10 fragments -- let's pretend that 10 machines crashed
> decode frags'
"my really important data"

Fault-Tolerance:

Suppose that we have N machines and encode our data as 2log(N) fragments with reconstruction threshold m = log(N). Let's assume that we store each fragment on a separate machine and each machine fails (independently) with probability at most 0.5.

• What is the probability of our data being safe? Pr[ at most n-m machines crash ] >= 1-0.5^(log(N)) = 1-N^(-1).

• What is the overhead in terms of space that we pay for this level of fault-tolerance? We have n fragments, each of size D/m, so the total space is n * D/ m = 2D. In other words, we can guarantee that the data survives with high probability by increasing the required space by a constant factor.

This library is based on the following works:

• "Efficient Dispersal of Information for Security, Load Balancing, and Fault Tolerance", by Michael O. Rabin, JACM 1989.

• "How to share a secret." by Adi Shamir. In Communications of the ACM 22 (11): 612–613, 1979.

• "Secret Sharing Made Short" Hugo Krawczyk. CRYPTO 1993: 136-146

Versions [faq] 1.0.0.0, 1.0.0.1, 1.0.0.2 AES (>=0.2.9), array (>=0.4.0.1), base (>1 && <1), binary (>=0.7.2.1), bytestring (>=0.10.0.2), entropy (>=0.3.2), finite-field (>=0.8.0), matrix (>=0.3.4.0), secret-sharing (>=1.0.0.0), syb (>=0.4.0), vector (>=0.10.11.0) [details] LGPL-2.1-only Peter Robinson 2014 Peter Robinson peter.robinson@monoid.at Revision 1 made by HerbertValerioRiedel at 2019-05-06T13:21:52Z Data, Cryptography http://monoid.at/code by PeterRobinson at 2014-09-08T16:32:19Z NixOS:1.0.0.2 2366 total (7 in the last 30 days) (no votes yet) [estimated by Bayesian average] λ λ λ Docs not available All reported builds failed as of 2016-12-13

## Modules

• Crypto
• Crypto.IDA
• Data
• Data.IDA
• Data.IDA.FiniteField
• Data.IDA.Internal