shannon-fano: Shannon-fano compression algorithm implementation in Haskell

[ codec, library, mit, program ] [ Propose Tags ]

[Skip to Readme]

Modules

[Index]

Codec
- Compression
  - Codec.Compression.ShannonFano

Downloads

shannon-fano-0.1.0.0.tar.gz [browse] (Cabal source package)
Package description (as included in the package)

Maintainer's Corner

Package maintainers

bolt12

For package maintainers and hackage trustees

edit package information

Candidates

1.0.0.0

Versions [RSS]	0.1.0.0, 0.1.0.1, 1.0.0.0
Change log	CHANGELOG.md
Dependencies	base (>=4.12 && <4.13), bytestring (>=0.10.8.2), split (>=0.2.3.3 && <0.3) [details]
License	GPL-3.0-only
Author	Armando Santos
Maintainer	armandoifsantos@gmail.com
Category	System
Home page	www.github.com/bolt12/shannon-fano
Uploaded	by bolt12 at 2018-11-03T12:35:16Z
Distributions	NixOS:1.0.0.0
Reverse Dependencies	1 direct, 0 indirect [details]
Downloads	1241 total (18 in the last 30 days)
Rating	(no votes yet) [estimated by Bayesian average]
Your Rating	λ λ λ
Status	Docs available [build log] Last success reported on 2018-11-03 [all 1 reports]

Readme for shannon-fano-0.1.0.0

[back to package description]

Shannon-Fano compression algorithm library

Haskell implementation

This library offers a set of functions to compress files into binary code applying the Shannon-Fano compression algorithm.

https://en.wikipedia.org/wiki/Shannon%E2%80%93Fano_coding

Installing / Getting started

I'm currently working to get this library on Hackage. So meanwhile you can start using it by building it.

$ git clone https://github.com/bolt12/shannon-fano.git
$ cd shannon-fano/
$ cabal configure
$ cabal build
$ cabal install

You can see if it's correctly installed by going into ghci and see if you can import the library.

$ ghci
> import Codec.Compression.ShannonFano
>

Use examples

To get the compressed binary string of a certain value.

import Codec.Compression.ShannonFano

main :: IO ()
main = putStrLn . show . compress frequency $ "test"

And you should see.

> main
Just "010110"

To generate only the coding table of some value.

import Codec.Compression.ShannonFano

main :: IO ()
main = putStrLn . show . genCodeTable . code . frequency $ "test"

And you should see.

> main
[('t',"0"),('e',"10"),('s',"11")]

To make a program that compresses a given file.

import Codec.Compression.ShannonFano
import System.Environment

main :: IO ()
main = do
    [file] <- getArgs
    content <- readFile file
    compressTofile frequency content

And you should get two resulting files:

out.bin: Has the binary of the compressed data
decode.dat: Has the decoding table structure

To make a program that decompresses a given binary file.

import Codec.Compression.ShannonFano
import System.Environment

main :: IO ()
main = do
    [decTableF, binaryF] <- getArgs
    decompressFromFile decTableF binaryF ""

And you should get one resulting file:

result.dat: Has the binary of the compressed data

You can check they are equal.

$ diff result.dat test.txt
$

Performance and compression

Testing the compressor program for a lorem ipsum text file of 921 words:

$ time ./compress test.txt

real	0m0.075s
user	0m0.067s
sys     0m0.007s

Compression:

$ ls -lh out.bin test.txt | cut -d " " -f5
5.7K
6.5K

Total ~ 12%

Testing the compressor program with 1M of random data:

$ base64 /dev/urandom | head -c 1000000 > test2.txt
$ time ./compress test2.txt

real	0m30.411s
user	0m27.930s
sys     0m2.187s

Compression:

$ ls -lh out.bin test2.txt | cut -d " " -f5
4.0M
977K

Total ~ -312%

Testing the compressor program with a 70K file containing repetitions of 5 characters:

$ time ./compress test3.txt

real	0m0.511s
user	0m0.489s
sys     0m0.017s

Compression:

$ ls -lh out.bin test3.txt | cut -d " " -f5
19K
70K

Total ~ 73%

Decompression:

$ time ./decompress decode.dat out.bin

real	0m0.128s
user	0m0.105s
sys     0m0.023s

Conclusion

As you can see, this algorithm performs very badly, in general with random and large data.

Also my implementation is far from efficient, if you have any suggestion on how to improve my solution I'd be more than happy to hear it!

Contributing

If you'd like to contribute, please fork the repository and use a feature branch. Pull requests are warmly welcome.

Licensing

The code in this project is licensed under GPL3 license.

shannon-fano: Shannon-fano compression algorithm implementation in Haskell

Modules

Downloads

Maintainer's Corner

Readme for shannon-fano-0.1.0.0

Shannon-Fano compression algorithm library

Installing / Getting started

Use examples

Performance and compression

Conclusion

Contributing

Links

Licensing