bitvec: Space-efficient bit vectors

[ bit-vectors, bsd3, data, library, public-domain ] [ Propose Tags ]

A newtype over Bool with a better Vector instance.

The vector package represents unboxed arrays of Bool This library provides a newtype wrapper Bit and a custom instance of unboxed Vector, which packs bits densely, achieving 8x less memory footprint. The performance stays mostly the same; the most significant degradation happens for random writes (up to 10% slower). On the other hand, for certain bulk bit operations Vector Bit is up to 64x faster than Vector Bool.

Thread safety

Data.Bit is faster, but writes and flips are thread-unsafe. This is because naive updates are not atomic: read the whole word from memory, then modify a bit, then write the whole word back.
Data.Bit.ThreadSafe is slower (up to 20%), but writes and flips are thread-safe.

Similar packages

bv and bv-little do not offer mutable vectors.
array is memory-efficient for Bool, but lacks a handy Vector interface and is not thread-safe.

[Skip to Readme]

Modules

[Index] [Quick Jump]

Data
- Data.Bit
  - Data.Bit.ThreadSafe

Flags

Automatic Flags

Name	Description	Default
bmi2	Enable bmi2 instruction set	Disabled

Use -f <flag> to enable a flag, or -f -<flag> to disable that flag. More info

Downloads

bitvec-1.0.0.1.tar.gz [browse] (Cabal source package)
Package description (revised from the package)

Note: This package has metadata revisions in the cabal description newer than included in the tarball. To unpack the package including the revisions, use 'cabal get'.

Maintainer's Corner

Package maintainers

JamesCook, Bodigrim

For package maintainers and hackage trustees

edit package information

Candidates

No Candidates

Versions [RSS]	0.1, 0.1.0.1, 0.1.0.2, 0.1.1.0, 0.2.0.0, 0.2.0.1, 1.0.0.0, 1.0.0.1, 1.0.1.0, 1.0.1.1, 1.0.1.2, 1.0.2.0, 1.0.3.0, 1.1.0.0, 1.1.1.0, 1.1.2.0, 1.1.3.0, 1.1.4.0, 1.1.5.0 (info)
Change log	changelog.md
Dependencies	base (>=4.8 && <5), bits-extra (>=0.0.0.4 && <0.1), ghc-prim, primitive (>=0.5), semigroups (>=0.8), vector (>=0.11 && <0.13) [details]
License	BSD-3-Clause
Copyright	2019 Andrew Lelechenko, 2012-2016 James Cook
Author	Andrew Lelechenko <andrew.lelechenko@gmail.com>, James Cook <mokus@deepbondi.net>
Maintainer	Andrew Lelechenko <andrew.lelechenko@gmail.com>
Revised	Revision 2 made by Bodigrim at 2022-06-19T20:42:34Z
Category	Data, Bit Vectors
Home page	https://github.com/Bodigrim/bitvec
Source repo	head: git clone git://github.com/Bodigrim/bitvec.git
Uploaded	by Bodigrim at 2019-08-10T19:29:07Z
Distributions	Arch:1.1.3.0, Fedora:1.1.4.0, LTSHaskell:1.1.5.0, NixOS:1.1.5.0, Stackage:1.1.5.0, openSUSE:1.1.5.0
Reverse Dependencies	15 direct, 4902 indirect [details]
Downloads	14001 total (379 in the last 30 days)
Rating	2.25 (votes: 2) [estimated by Bayesian average]
Your Rating	λ λ λ
Status	Docs available [build log] Last success reported on 2019-08-10 [all 1 reports]

Readme for bitvec-1.0.0.1

[back to package description]

bitvec

A newtype over Bool with a better Vector instance.

The vector package represents unboxed arrays of Bool spending 1 byte (8 bits) per boolean. This library provides a newtype wrapper Bit and a custom instance of unboxed Vector, which packs bits densely, achieving 8x less memory footprint. The performance stays mostly the same; the most significant degradation happens for random writes (up to 10% slower). On the other hand, for certain bulk bit operations Vector Bit is up to 64x faster than Vector Bool.

Thread safety

Data.Bit is faster, but writes and flips are thread-unsafe. This is because naive updates are not atomic: read the whole word from memory, then modify a bit, then write the whole word back.
Data.Bit.ThreadSafe is slower (up to 20%), but writes and flips are thread-safe.

Similar packages

bv and bv-little do not offer mutable vectors.
array is memory-efficient for Bool, but lacks a handy Vector interface and is not thread-safe.

Quick start

Consider the following (very naive) implementation of the sieve of Eratosthenes. It returns a vector with True at prime indices and False at composite indices.

import Control.Monad
import Control.Monad.ST
import qualified Data.Vector.Unboxed as U
import qualified Data.Vector.Unboxed.Mutable as MU

eratosthenes :: U.Vector Bool
eratosthenes = runST $ do
  let len = 100
  sieve <- MU.replicate len True
  MU.write sieve 0 False
  MU.write sieve 1 False
  forM_ [2 .. floor (sqrt (fromIntegral len))] $ \p -> do
    isPrime <- MU.read sieve p
    when isPrime $
      forM_ [2 * p, 3 * p .. len - 1] $ \i ->
        MU.write sieve i False
  U.unsafeFreeze sieve

We can switch from Bool to Bit just by adding newtype constructors:

import Data.Bit

import Control.Monad
import Control.Monad.ST
import qualified Data.Vector.Unboxed as U
import qualified Data.Vector.Unboxed.Mutable as MU

eratosthenes :: U.Vector Bit
eratosthenes = runST $ do
  let len = 100
  sieve <- MU.replicate len (Bit True)
  MU.write sieve 0 (Bit False)
  MU.write sieve 1 (Bit False)
  forM_ [2 .. floor (sqrt (fromIntegral len))] $ \p -> do
    Bit isPrime <- MU.read sieve p
    when isPrime $
      forM_ [2 * p, 3 * p .. len - 1] $ \i ->
        MU.write sieve i (Bit False)
  U.unsafeFreeze sieve

Bit-based implementation requires 8x less memory to store the vector. For large sizes it allows to crunch more data in RAM without swapping. For smaller arrays it helps to fit into CPU caches.

> listBits eratosthenes
[2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97]

There are several high-level helpers, digesting bits in bulk, which makes them up to 64x faster than respective counterparts for Vector Bool. One can query population count (popcount) of a vector (giving us the prime-counting function):

> countBits eratosthenes
25

And vice-versa, query an address of the n-th set bit (which corresponds to the n-th prime number here):

> nthBitIndex (Bit True) 10 eratosthenes
Just 29

One may notice that the order of the inner traversal by i does not matter and get tempted to run it in several parallel threads. In this case it is vital to switch from Data.Bit to Data.Bit.ThreadSafe, because the former is thread-unsafe with regards to writes. There is a moderate performance penalty (up to 20%) for using the thread-safe interface.