git-annex can transfer data to and from configured git remotes. Normally those remotes are normal git repositories (bare and non-bare; local and remote), that store the file contents in their own git-annex directory.
But, git-annex also extends git's concept of remotes, with these special types of remotes. These can be used just like any normal remote by git-annex. They cannot be used by other git commands though.
- S3 (Amazon S3, and other compatible services)
- Amazon Glacier
- bup
- directory
- rsync
- webdav
- web
- xmpp
- hook
The above special remotes can be used to tie git-annex into many cloud services. Here are specific instructions for various cloud things:
- using Amazon S3
- using Amazon Glacier
- Internet Archive via S3
- tahoe-lafs
- using box.com as a special remote
- special remote for IMAP
Unused content on special remotes
Over time, special remotes can accumulate file content that is no longer
referred to by files in git. Normally, unused content in the current
repository is found by running git annex unused
. To detect unused content
on special remotes, instead use git annex unused --from
. Example:
$ git annex unused --from mys3
unused mys3 (checking for unused data...)
Some annexed data on mys3 is not used by any files in this repository.
NUMBER KEY
1 WORM-s3-m1301674316--foo
(To see where data was previously used, try: git log --stat -S'KEY')
(To remove unwanted data: git-annex dropunused --from mys3 NUMBER)
$ git annex dropunused --from mys3 1
dropunused 12948 (from mys3...) ok
Similar to a JABOD, this would be Just A Bunch Of Files. I already have a NAS with a file structure conducive to serving media to my TV. However, it's not capable (currently) of running git-annex locally. It would be great to be able to tell annex the path to a file there as a remote much like a web remote from "git annex addurl". That way I can safely drop all the files I took with me on my trip, while annex still verifies and counts the file on the NAS as a location.
There are some interesting things to figure out for this to be efficient. For example, SHAs of the files. Maybe store that in a metadata file in the directory of the files? Or perhaps use the WORM backend by default?
Would it be possible to support Rapidshare as a new special remote? They offer unlimited storage for 6-10€ per month. It would be great for larger backups. Their API can be found here: http://images.rapidshare.com/apidoc.txt
Is there any chance a special remote that functions like a hybrid of 'web' and 'hook'? At least in theory, it should be relatively simple, since it would only support 'get' and the only meaningful parameters to pass would be the URL and the output file name.
Maybe make it something like git config annex.myprogram-webhook 'myprogram $ANNEX_URL $ANNEX_FILE', and fetching could work by adding a --handler or --type parameter to addurl.
The use case here is anywhere that simple 'fetch the file over HTTP/FTP/etc' isn't workable - maybe it's on rapidshare and you need to use plowshare to download it; maybe it's a youtube video and you want to use youtube-dl, maybe it's a chapter of a manga and you want to turn it into a CBZ file when you fetch it.