The critbit package

[Tags: bsd3, library]

This package implements crit-bit trees, a key-value container type for storing keys that can be treated as bitstrings (e.g. ByteString and Text).

Compared to the data structures from the containers and unordered-containers packages, you will find that sometimes the functions implemented in this package are faster, sometimes slower.

In many cases, a CritBit tree provides performance close to that of a HashMap, while providing ordered storage and traversal like a Map.


[Skip to ReadMe]

Properties

Versions0.0.0.0, 0.1.0.0, 0.2.0.0
Change logNone available
Dependenciesarray, base (==4.*), bytestring (>=0.9), deepseq, text (>=0.11.2.3), vector [details]
LicenseBSD3
Copyright2013-2014 Bryan O'Sullivan and others
AuthorBryan O'Sullivan <bos@serpentine.com>
MaintainerBryan O'Sullivan <bos@serpentine.com>
CategoryData
Home pagehttps://github.com/bos/critbit
Bug trackerhttps://github.com/bos/critbit/issues
Source repositoryhead: git clone https://github.com/bos/critbit
head: hg clone https://bitbucket.org/bos/critbit
UploadedFri Jul 4 05:40:07 UTC 2014 by BryanOSullivan
DistributionsNixOS:0.2.0.0
Downloads510 total (21 in last 30 days)
Votes
0 []
StatusDocs available [build log]
Successful builds reported [all 1 reports]

Modules

[Index]

Flags

NameDescriptionDefault
developeroperate in developer modeDisabled

Use -f <flag> to enable a flag, or -f -<flag> to disable that flag. More info

Downloads

Maintainers' corner

For package maintainers and hackage trustees

Readme for critbit-0.2.0.0

Crit-bit trees for Haskell

This is the first purely functional implementation of crit-bit trees that I'm aware of.

A crit-bit tree is a key/value container that allows efficient lookups and ordered traversal for data that can be represented as a string of bits.

This package exists in part with education in mind:

Education aside, crit-bit trees offer some interesting features compared to other key/value container types in Haskell.

Of course crit-bit trees have some downsides, too. For example, building a tree from randomly ordered inputs is somewhat slow, and of course the set of usable key types is small (only types that can be interpreted as bitstrings "for free").

Compared to the most easily findable crit-bit tree code you'll come across that's written in C, the core of this library has a lot less accidental complexity, and so may be easier to understand. It also handles arbitrary binary data that will cause the C library to go wrong.

How to contribute

I've purposely published this package in an incomplete state, and I'd like your help to round it out. In return, you get to learn a little Haskell, have your code reviewed by someone who wants to see you succeed, and contribute to a rather nifty library.

Do you need any prior experience with Haskell to get started? No! All you need is curiosity and the ability to learn from context. Oh, and a github account.

My aim with this library is drop-in API compatibility with the widely used Haskell containers library, which has two happy consequences:

Getting started

If you want to contribute or play around, please use the most modern version of the Haskell Platform.

Once you have the Platform installed, there are just a few more steps.

Set up your local database of known open source Haskell packages.

cabal update

Both the new cabal command and cabal-dev will install to $HOME/.cabal/bin, so put that directory at the front of your shell's search path before you continue.

Get the critbit source.

git clone git://github.com/bos/critbit

Set up a sandbox.

The first time through, you may need to download and install a ton of dependencies, so hang in there.

cd critbit
cabal sandbox init
cabal install \
--enable-tests \
--enable-benchmarks \
    --only-dependencies \
-j

The -j flag above tells cabal to use all of your CPUs, so even the initial build shouldn't take more than a few minutes.

To actually build:

cabal build

Running the test suite

Once you've built the code, you can run the entire test suite fairly quickly. This takes about 30 seconds on my oldish 8-core Mac laptop:

dist/build/tests/tests +RTS -N

(The +RTS -N above tells GHC's runtime system to use all available cores.)

If you're feeling impatient, run a subset of the test suite:

dist/build/tests/tests -t properties/map/bytestring +RTS -N

And if you want to explore, the tests program accepts a --help option. Try it out.

Running benchmarks

It is just as easy to benchmark stuff as to test it.

First, you need a dictionary. If your system doesn't have a file named /usr/share/dict/words, you can download a dictionary here.

If you've downloaded a dictionary, tell the benchmark suite where to find it by setting the WORDS environment variable.

export WORDS=/my/path/to/linuxwords

You can then run benchmarks and generate a report. For instance, this runs every benchmark that begins with bytestring/lookup.

dist/build/benchmarks/benchmarks -o lookup.html \
    bytestring/lookup

Open the lookup.html file in your browser. Here's an example of what to expect.

As with tests, run the benchmarks program with --help if you want to do some exploring.

What your code should look like

Okay, so you've bought into this idea, and would like to try writing a patch. How to begin?

I've generally tried to write commits with a view to being readable, so there are examples you can follow.

For instance, here's the commit where I added the keys function. This commit follows a simple pattern:

Naturally, you'll follow the prevailing coding and formatting style. If you forget, I'll be sad and offer you only a terse "fix your formatting" review, and then you'll be sad too.

What your commits should look like

Please follow the guidelines below, as they make it easier to review your pull request and deal with your commits afterwards.

(If you can't follow the guidelines, there's a good chance I'll ask you to fix your commits and resubmit them.)