When a file is annexed, a key is generated from its content and/or metadata. The file checked into git symlinks to the key. This key can later be used to retrieve the file's content (its value).

Multiple pluggable key-value backends are supported, and a single repository can use different ones for different files.

  • SHA256E -- The default backend for new files. This allows verifying that the file content is right, and can avoid duplicates of files with the same content. Its need to generate checksums can make it slower for large files.
  • SHA256 -- Does not include the file extension in the key, which can lead to better deduplication.
  • WORM ("Write Once, Read Many") This assumes that any file with the same basename, size, and modification time has the same content. This is the the least expensive backend, recommended for really large files or slow systems.
  • SHA512, SHA512E -- Best currently available hash, for the very paranoid.
  • SHA1, SHA1E -- Smaller hash than SHA256 for those who want a checksum but are not concerned about security.
  • SHA384, SHA384E, SHA224, SHA224E -- Hashes for people who like unusual sizes.

The annex.backends git-config setting can be used to list the backends git-annex should use. The first one listed will be used by default when new files are added.

For finer control of what backend is used when adding different types of files, the .gitattributes file can be used. The annex.backend attribute can be set to the name of the backend to use for matching files.

For example, to use the SHA256E backend for sound files, which tend to be smallish and might be modified or copied over time, while using the WORM backend for everything else, you could set in .gitattributes:

* annex.backend=WORM
*.mp3 annex.backend=SHA256E
*.ogg annex.backend=SHA256E

It turns out that (at least on x86-64 machines) SHA512 is faster than SHA256. In some benchmarks I performed1 SHA256 was 1.8–2.2x slower than SHA1 while SHA512 was only 1.5–1.6x slower.

SHA224 and SHA384 are effectively just truncated versions of SHA256 and SHA512 so their performance characteristics are identical.

1 time head -c 100000000 /dev/zero | shasum -a 512

Comment by NanoTech Fri Aug 10 00:37:32 2012

In case you came here looking for the URL backend.

The URL backend

Several documents on the web refer to a special "URL backend", e.g. Large file management with git-annex [LWN.net]. Historical content will never be updated yet it drives people to living places.

Why a URL backend ?

It is interesting because you can:

  • let git-annex rest on the fact that some documents are available as extra copies available at any time (but from something that is not a git repository).
  • track these documents like your own with all git features, which opens up some truly marvelous combinations, which this margin is too narrow to contain (Pierre d.F. wouldn't disapprove ;-).

How/Where now ?

git-annex used to have a URL backend. It seems that the design changed into a "special remote" feature, not limited to the web. You can now track files available through plain directories, rsync, webdav, some cloud storage, etc, even clay tablets. For details see special remotes.

Comment by Stéphane Thu Jan 3 06:59:35 2013
Comments on this page are closed.