freq: Are you ready to get freaky?

[ data, library, mit, text ] [ Propose Tags ] [ Report a vulnerability ]

This library provides a way to train a model that predicts the "randomness" of an input ByteString.


[Skip to Readme]

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees

Candidates

  • No Candidates
Versions [RSS] 0.0.0, 0.1.0.0, 0.1.0.1, 0.1.0.2, 0.1.0.3, 0.1.0.4, 0.1.1
Dependencies base (>=4.9 && <5.0), bytestring, containers, freq, primitive (>=0.6.1) [details]
Tested with ghc ==8.2.1, ghc ==8.2.2, ghc ==8.4.1, ghc ==8.4.2
License MIT
Author Daniel Cartwright
Maintainer dcartwright@layer3com.com
Category Data
Home page https://github.com/chessai/freq
Bug tracker https://github.com/chessai/freq/issues
Source repo head: git clone https://github.com/chessai/freq.git -b master
Uploaded by chessai at 2018-05-24T02:42:33Z
Distributions
Executables freq-train
Downloads 3491 total (27 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs available [build log]
Last success reported on 2018-05-24 [all 1 reports]

Readme for freq-0.1.0.1

[back to package description]

freq

About

This is a simple cryptanalytic frequency analysis tool that uses english character digrams as a probabilistic model for scoring ByteStrings according to their randomness (0..1, 0 being the most random, 1 being the least random).

Uses

I currently use this to validate domain names, and so the training data available consists of about 6.5 Megabytes of Public Domain 19th and 20th century English novels. You can feed any training data you wish to 'freq' to achieve different results.

Improvements

To improve further on the accuracy of this approach, I will experiment with a generalised dynamic n-gram approach, which will sacrifice slightly in performance for a potentially large gain in accuracy.

Further

See my implementation of the Linear Hadamard Spectral Test for a different approach to a similar problem with much higher variance in the potentially random data, where a trained approach might be less accurate.