token-search

[ library, mit, program, unclassified ] [ Propose Tags ]

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees

Candidates

  • No Candidates
Versions [RSS] 0.1.0.0
Change log ChangeLog.md
Dependencies aeson (>=1.4.5 && <2), base (>=4.7 && <5), bytestring (>0.10.8 && <1), conduit, hashable, process (>=1.6 && <2), streaming-commons, text (>=1.2.3), token-search, unordered-containers [details]
License MIT
Copyright 2019 Josh Clayton
Author Josh Clayton
Maintainer sayhi@joshuaclayton.me
Home page https://github.com/joshuaclayton/token-search#readme
Bug tracker https://github.com/joshuaclayton/token-search/issues
Source repo head: git clone https://github.com/joshuaclayton/token-search
Uploaded by joshuaclayton at 2019-12-22T02:20:03Z
Distributions
Executables token-search
Downloads 378 total (3 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs available [build log]
Last success reported on 2019-12-22 [all 1 reports]

Readme for token-search-0.1.0.0

[back to package description]

token-search

This is a library for efficient substring detection across a codebase.

Motivation

Unused leverages ctags' token generation in conjunction with a tool to search the file system (either ripgrep or The Silver Searcher).

During execution, Unused shells out from Haskell for each unique token and searches the appropriate files for each token. This means each file searched is searched thousands of times each run, which takes a significant amount of time.

Approach

Instead of searching each file git tracks for each of potentially thousands of tokens, token-search processes each file once.

With the tokens:

rem
or
lo

and the text:

lorem ipsum
dolor sit amet

token-search then:

  1. Builds a trie with the three tokens
  2. Iterates over each character in the text, while also:
    • creating a new copy of the trie
    • adding the created trie to a list of all non-terminated tries
    • walks each trie by the character
    • maintains a list of terminal nodes as tries are walked
    • increments a terminal node count for tokens as they're encountered

Install

stack install

Test

stack test

License

Copyright 2019 Josh Clayton. See the LICENSE.