takedouble: duplicate file finder

[ bsd3, library, program, utilities ] [ Propose Tags ]

takedouble is a fast duplicate file finder that filters by file size, first and last 4k chunks before checking the full contents of files that pass the filter.


[Skip to Readme]

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees

Candidates

  • No Candidates
Versions [RSS] 0.0.1.1, 0.0.2.0
Change log CHANGELOG.md
Dependencies base (>=4.11 && <5), bytestring, directory, extra, filepath, filepattern, takedouble, unix [details]
License BSD-3-Clause
Copyright Shae Erisson
Author Shae Erisson
Maintainer Shae Erisson
Category Utilities
Home page https://github.com/shapr/takedouble
Source repo head: git clone https://github.com/shapr/takedouble.git
Uploaded by ShaeErisson at 2022-06-26T17:43:49Z
Distributions NixOS:0.0.2.0
Executables takedouble
Downloads 110 total (8 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs available [build log]
Last success reported on 2022-06-26 [all 1 reports]

Readme for takedouble-0.0.2.0

[back to package description]

takedouble

TakeDouble is a duplicate file finder that reads and checks the filesize and first 4k and last 4k of a file and only then checks the full file to find duplicates.

How do I make it go?

You can use nix or cabal to build this.

cabal build should produce a binary. (use ghcup to install cabal and the latest GHC version).

After that, takedouble <dirname> so you could use takedouble ~/ for example.

If there are common files you'd like to exclude (such as .git directories) you can pass a glob to exclude any matching patterns from the output.

For example

takedouble <dirname> "**/.git/**"

Is it Fast?

On my ThinkPad with six Xeon cores, 128GB RAM, and a 1TB Samsung 970 Pro NVMe (via PCIe 3.0), I can check 34393 uncached files in 6.4 seconds. A second run on the same directory takes 2.8 seconds due to file metainfo cached in memory.