The intset package

[Tags: bsd3, library]

This library provides persistent, time and space efficient integer sets implemented as dense big-endian patricia trees with buddy suffixes compaction. In randomized settings this structure expected to be as fast as Data.IntSet from containers, but if a sets is likely to have long continuous intervals it should be much faster.

[Skip to ReadMe]


Change logChangeLog
Dependenciesbase (==4.*), bits-extras (>=0.1.2), bytestring, deepseq [details]
Copyright(c) 2013, Sam Truzjan
AuthorSam Truzjan
MaintainerSam Truzjan <>
Home page
Bug tracker
Source repositoryhead: git clone git://
this: git clone git:// -b master(tag v0.1.1.0)
UploadedSat Dec 7 03:44:16 UTC 2013 by SamTruzjan
Downloads1076 total (32 in last 30 days)
0 []
StatusDocs uploaded by user
Build status unknown [no reports yet]




testingEnable testing stuff and expose internals.DisabledAutomatic

Use -f <flag> to enable a flag, or -f -<flag> to disable that flag. More info


Maintainers' corner

For package maintainers and hackage trustees

Readme for intset-


This package provides efficient integer interval sets.


Persistent... is it trees?

Yes, Radix trees. Trees are balanced by prefix bits, so we have fast merge operations, such as union, intersection and difference. Chris Okasaki and Andrew Gill shows that Patricia tree based integer maps might be order of magnitude faster than Red-Black tree counterparts on this operations. The same apply to integer sets, we just have keys, but don't have values.

That does mean the "dense"?

That means we keep suffixes in bitmaps and we might pack, say 10, integers which lies close together in one bitmap. This optimization doesn't affect execution times for sparse sets, but makes dense sets much more memory efficient — near 10-50 times less space usage depending on machine word size and the actual density of the set. Basically, this let us be 3-4 times less memory efficient comparing with arrays of tightly packed bits, but see...

How suffix compaction is performed?

There are exist a pretty simple algorithm used in memory allocators called "buddy memory allocator". In a nutshell, we have a big block which is splitted by half when we remove from one of the half, and merge then back when we insert. It's somewhat inverse to the ordinary tree approach — in buddy tree we hold more information about elements that it doesn't contain, while in prefix tree we hold more information about elements that it does contain. It's easy to guess that we should do with it — take the two structures then fuse them into one to produce a new structure which perform better.

Indeed, the key idea in the design is right here — we switch forth and back between representations per subtree basis. We intersperse different representations in different tree branches. It's like chameleon:

That is, we just get a structure that dynamically choose the optimal representation depending on density of set. Moreover in best case this lead to huge space savings:

> ppStats (fromList [0..123456])


Bin count: 6
Tip count: 1
Fin count: 6
Size in bytes: 408
Saved space over dense set:  123072
Saved space over bytestring: 11879
Percent saved over dense set:  99.6695821185617%
Percent saved over bytestring: 96.67941727028567%

The ppStats is not an exposed function but you can play with it using cabal-dev ghci.

I don't know if it is an old idea, but this works just fine.

So when this data structure is good choice?

In many situation. It might be used as persistent and compact replacement for bool arrays or Data.IntSet with the following advantages:

How this implementation relate to containers version?

Heavely based. Essentially we just add the buddy interval compaction, but it turns out that some operations becomes more complicated and requires much more effort to implement — in order to maintain the all tree invariants we need to take into account more cases. This is the reason why some operations are not implemented yet (e.g. lack of views), but I hope I'll fix it with the time.


For documentation see haddock generated documentation.

Build Status

Build Status


This library is written and maintained by Sam T.

Feel free to report bugs and suggestions via github issue tracker or the mail.