bytestring-lexing: Efficiently parse and produce common integral and fractional numbers.

[ bsd2, bsd3, data, library ] [ Propose Tags ]

The bytestring-lexing package offers extremely efficient ByteString parsers for some common lexemes: namely integral and fractional numbers. In addition, it provides efficient serializers for (some of) the formats it parses.

As of version 0.3.0, bytestring-lexing offers the best-in-show parsers for integral values. (According to the Warp web server's benchmark of parsing the Content-Length field of HTTP headers.) And as of version 0.5.0 it offers (to my knowledge) the best-in-show parser for fractional/floating numbers.

Some benchmarks for this package can be found at: http://code.haskell.org/~wren/bytestring-lexing/bench/html


[Skip to Readme]
Versions [RSS] [faq] 0.1, 0.1.0.1, 0.1.0.2, 0.1.2, 0.2, 0.2.1, 0.3.0, 0.4.0, 0.4.2, 0.4.3, 0.4.3.1, 0.4.3.2, 0.4.3.3, 0.5.0, 0.5.0.1, 0.5.0.2, 0.5.0.7, 0.5.0.8
Change log CHANGELOG
Dependencies base (>=4.5 && <4.16), bytestring (>=0.9.2.1 && <0.12) [details]
License BSD-3-Clause
Copyright Copyright (c) 2012--2021 wren gayle romano, 2008--2011 Don Stewart
Author wren gayle romano, Don Stewart
Maintainer wren@cpan.org
Category Data
Home page https://wrengr.org/
Source repo head: git clone https://github.com/wrengr/bytestring-lexing.git
Uploaded by WrenRomano at 2021-10-17T01:41:17Z
Distributions Debian:0.5.0.2, FreeBSD:0.5.0.2, LTSHaskell:0.5.0.8, NixOS:0.5.0.2, Stackage:0.5.0.8
Downloads 31221 total (459 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Hackage Matrix CI
Docs available [build log]
Last success reported on 2021-10-17 [all 1 reports]

Modules

[Index] [Quick Jump]

Downloads

Maintainer's Corner

For package maintainers and hackage trustees

Candidates


Readme for bytestring-lexing-0.5.0.7

[back to package description]

bytestring-lexing

Hackage version Build Status Dependencies

The bytestring-lexing package offers extremely efficient ByteString parsers for some common lexemes: namely integral and fractional numbers. In addition, it provides efficient serializers for (some of) the formats it parses.

As of version 0.3.0, bytestring-lexing offers the best-in-show parsers for integral values. And as of version 0.5.0 it offers (to my knowledge) the best-in-show parser for fractional/floating numbers. A record of these benchmarks can be found here

Install

This is a simple package and should be easy to install. You should be able to use any of the standard methods to install it.

-- With cabal-install and without the source:
$> cabal install bytestring-lexing

-- With cabal-install and with the source already:
$> cd bytestring-lexing
$> cabal install

-- Without cabal-install, but with the source already:
$> cd bytestring-lexing
$> runhaskell Setup.hs configure --user
$> runhaskell Setup.hs build
$> runhaskell Setup.hs haddock --hyperlink-source
$> runhaskell Setup.hs copy
$> runhaskell Setup.hs register

The Haddock step is optional.

Testing

If you want to run the test suite, use the following standard method (with runhaskell Setup.hs in lieu of cabal, if necessary):

$> cd bytestring-lexing
$> cabal configure --enable-tests --enable-coverage
$> cabal build
$> cabal test --keep-tix-files

The results of the code coverage are in ./dist/hpc/vanilla/html/bytestring-lexing-$VERSION/hpc_index.html. If you're not interested in the coverage of the test suite, then you needn't pass the --enable-coverage nor --keep-tix-files flags. Note that older versions of cabal used the flag name --enable-library-coverage instead of --enable-coverage. And IIRC hpc integration in cabal was broken for ghc-7.6.

Benchmarks

If you want to run the benchmarking code, then do:

$> cd bytestring-lexing/bench
$> cabal configure
$> cabal build
$> for b in isSpace numDigits packDecimal readDecimal readExponential ceilEightThirds; do
       ./dist/build/bench-${b}/bench-${b} -o ${b}.html;
   done && open *.html

Of course, you needn't run all the benchmarking programs if you don't want. Notably, these benchmarks are artefacts of the development of the library. They are not necessarily the most up-to-date reflection of the library itself, nor of other Haskell libraries we've compared against in the past.

Portability

An attempt has been made to keep this library portable. However, we do make use of two simple language extensions. Both of these would be easy enough to remove, but they should not pose a significant portability burden. If they do in fact pose a burden for your compiler, contact the maintainer.

  • ScopedTypeVariables - the decimalPrecision function in Data.ByteString.Lex.Fractional uses ScopedTypeVariables for efficiency; namely to ensure that the constant function decimalPrecision need only compute its result once (per type), and that its result has no data dependency on the proxy argument.

  • BangPatterns - are used to make the code prettier and to "improve" code coverage over the equivalent semantics via the following idiom:

      foo x ... z
          | x `seq` ... `seq` z `seq` False = error "impossible"
          | otherwise = ...
    

    BangPatterns are supported in GHC as far back as version 6.6.1, and are also supported by JHC and UHC. As of 2010, they were not supported by Hugs; but alas Hugs is pretty much dead now.

Changes: Version 0.5.0 (2015-05-06) vs 0.4.3 (2013-03-21)

I've completely overhauled the parsers for fractional numbers.

The old Data.ByteString.Lex.Double and Data.ByteString.Lex.Lazy.Double modules have been removed, as has their reliance on Alex as a build tool. I know some users were reluctant to use bytestring-lexing because of that dependency, and forked their own version of bytestring-lexing-0.3.0's integral parsers. This is no longer an issue, and those users are requested to switch over to using bytestring-lexing.

The old modules are replaced by the new Data.ByteString.Lex.Fractional module. This module provides two variants of the primary parsers. The readDecimal and readExponential functions are very simple and should suffice for most users' needs. The readDecimalLimited and readExponentialLimited are variants which take an argument specifying the desired precision limit (in decimal digits). With care, the limited-precision parsers can perform far more efficiently than the unlimited-precision parsers. Performance aside, they can also be used to intentionally restrict the precision of your program's inputs.

Benchmarks: Version 0.5.0 (2015-05-06)

The Criterion output of the benchmark discussed below, is available here. The main competitors we compare against are the previous version of bytestring-lexing (which already surpassed text and attoparsec/scientific) and bytestring-read which was the previous best-in-show.

The unlimited-precision parsers provide 3.3x to 3.9x speedup over the readDouble function from bytestring-lexing-0.4.3.3, as well as being polymorphic over all Fractional values. For Float/Double: these functions have essentially the same performance as bytestring-read on reasonable inputs (1.07x to 0.89x), but for inputs which have far more precision than Float/Double can handle these functions are much slower than bytestring-read (0.30x 'speedup'). However, for Rational: these functions provide 1.26x to 1.96x speedup compared to bytestring-read.

The limited-precision parsers do even better, but require some care to use properly. For types with infinite precision (e.g., Rational) we can pass in an 'infinite' limit by passing the length of the input string plus one. For Rational: doing so provides 1.5x speedup over the unlimited-precision parsers (and 1.9x to 3x speedup over bytestring-read), because we can avoid intermediate renormalizations. Whether other unlimited precision types would see the same benefit remains an open question.

For types with inherently limited precision (e.g., Float/Double), we could either pass in an 'infinite' limit or we could pass in the actual inherent limit. For types with inherently limited precision, passing in an 'infinite' limit degrades performance compared to the unlimited-precision parsers (0.51x to 0.8x 'speedup'). Whereas, passing in the actual inherent limit gives 1.3x to 4.5x speedup over the unlimited-precision parsers. They also provide 1.2x to 1.4x speedup over bytestring-read; for a total of 5.1x to 14.4x speedup over bytestring-lexing-0.4.3.3!