# The data-dispersal package

- Exposed modules use unallocated top-level names: Crypto

This library provides space-efficient (m,n)-information dispersal algorithms (IDAs).

Given a ByteString `bstr` of length `D`, we encode `bstr` as a list `fs` of `n`
`Fragment`s, each containing a ByteString
of length `O(D/m)`. Then, each fragment in `fs` could be stored on a separate
machine for fault-tolerance.
Even if up to `n-m` of these machines crash, we can still reconstruct the original
ByteString out of the remaining m fragments.
The total space required for the n fragments is `O((n/m)*D)`.
Note that `m` and `n` are roughly in the same order, so the actual storage overhead
for getting good fault-tolerance increases only by a constant factor.

The module `Data.IDA` contains the basic information dispersal algorithm. The module
`Crypto.IDA` augments the dispersal scheme by combining it with secret sharing, i.e.,
the knowledge of up to `m-1` fragments does not leak any information about
the original data. See `Crypto.IDA` for details.

*GHCi Example:*

> :m + Data.IDA > let msg = Data.ByteString.Char8.pack "my really important data" > let fragments = encode 5 15 msg -- Now we could distributed the fragments on different sites to add some -- fault-tolerance. > let frags' = drop 5 $ take 10 fragments -- let's pretend that 10 machines crashed > decode frags' "my really important data"

*Fault-Tolerance:*

Suppose that we have `N` machines and encode our data as `2log(N)` fragments
with reconstruction threshold m = `log(N)`.
Let's assume that we store each fragment on a separate machine and each
machine fails (independently) with probability at most 0.5.

What is the probability of our data being safe?

`Pr[ at most n-m machines crash ] >= 1-0.5^(log(N)) = 1-N^(-1).`What is the overhead in terms of space that we pay for this level of fault-tolerance? We have n fragments, each of size D/m, so the total space is

`n * D/ m = 2D.`In other words, we can guarantee that the data survives with high probability by increasing the required space by a constant factor.

This library is based on the following works:

"Efficient Dispersal of Information for Security, Load Balancing, and Fault Tolerance", by Michael O. Rabin, JACM 1989.

"How to share a secret." by Adi Shamir. In Communications of the ACM 22 (11): 612–613, 1979.

"Secret Sharing Made Short" Hugo Krawczyk. CRYPTO 1993: 136-146

## Properties

Versions | 1.0.0.0, 1.0.0.0, 1.0.0.1, 1.0.0.2 |
---|---|

Dependencies | AES (>=0.2.9), array (>=0.4.0.1), base (==4.6.*), binary (>=0.7.2.1), bytestring (>=0.10.0.2), entropy (>=0.3.2), finite-field (>=0.8.0), matrix (>=0.3.4.0), secret-sharing (>=1.0.0.0), syb (>=0.4.0), vector (>=0.10.11.0) [details] |

License | LGPL-2.1 |

Copyright | Peter Robinson 2014 |

Author | Peter Robinson <peter.robinson@monoid.at> |

Maintainer | peter.robinson@monoid.at |

Category | Data, Cryptography |

Home page | http://monoid.at/code |

Uploaded | Mon Sep 8 16:30:25 UTC 2014 by PeterRobinson |

## Modules

*Crypto*- Crypto.IDA

*Data*- Data.IDA
- Data.IDA.FiniteField
- Data.IDA.Internal

- Data.IDA

## Downloads

- data-dispersal-1.0.0.0.tar.gz [browse] (Cabal source package)
- Package description (included in the package)

#### Maintainers' corner

For package maintainers and hackage trustees