ncd-tree: text similarity search using normalized compression distance and VP trees
This is a package candidate release! Here you can preview how this package release will appear once published to the main package index (which can be accomplished via the 'maintain' link below). Please note that once a package has been published to the main package index it cannot be undone! Please consult the package uploading documentation for more information.
Warnings:
- [option-o2] 'ghc-options: -O2' is rarely needed. Check that it is giving a real benefit and not just imposing longer compile times on your users.
- [missing-upper-bounds] On library, these packages miss upper bounds: - bytestring - heaps - vector - vector-algorithms - zlib Please add them. There is more information at https://pvp.haskell.org/
ncd-tree is a Haskell library that implements a data structure and query layer for efficient similarity search based on the Normalized Compression Distance (NCD) metric and Vantage Point (VP) trees. It allows users to store and query large datasets of text documents, enabling fast retrieval of similar texts based on their compressed representations. This library is particularly useful for applications such as document clustering and recommendation systems. NCD is a parameter-free, universal similarity metric based on information distance, which quantifies how similar two objects are by measuring the length of their compressed concatenation relative to their individual compressed lengths. VP trees are a type of metric tree that organizes data points in a way that allows for efficient nearest neighbor searches in metric spaces.
[Skip to Readme]
Properties
| Versions | 0.1.0.0 |
|---|---|
| Change log | CHANGELOG.md |
| Dependencies | base (>=4.7 && <5), bytestring, heaps, vector, vector-algorithms, zlib [details] |
| License | BSD-3-Clause |
| Copyright | 2025 Marco Zocca |
| Author | Marco Zocca |
| Maintainer | ocramz |
| Category | Data |
| Home page | https://github.com/ocramz/ncd-tree |
| Source repo | head: git clone https://github.com/ocramz/ncd-tree |
| Uploaded | by ocramz at 2025-12-23T13:02:36Z |
Modules
- Data
- Data.NCDTree
Downloads
- ncd-tree-0.1.0.0.tar.gz [browse] (Cabal source package)
- Package description (as included in the package)
Maintainer's Corner
Package maintainers
For package maintainers and hackage trustees