This special remote type stores file contents in a bucket in Amazon S3 or a similar service.

See using Amazon S3 and Internet Archive via S3 for usage examples.

configuration

The standard environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are used to supply login credentials for Amazon. You need to set these only when running git annex initremote, as they will be cached in a file only you can read inside the local git repository.

A number of parameters can be passed to git annex initremote to configure the S3 remote.

  • encryption - Required. Either "none" to disable encryption (not recommended), or a value that can be looked up (using gpg -k) to find a gpg encryption key that will be given access to the remote, or "shared" which allows every clone of the repository to access the encrypted data (use with caution).

    Note that additional gpg keys can be given access to a remote by rerunning initremote with the new key id. See encryption.

  • embedcreds - Optional. Set to "yes" embed the login credentials inside the git repository, which allows other clones to also access them. This is the default when gpg encryption is enabled; the credentials are stored encrypted and only those with the repository's keys can access them.

    It is not the default when using shared encryption, or no encryption. Think carefully about who can access your repository before using embedcreds without gpg encryption.

  • datacenter - Defaults to "US". Other values include "EU", "us-west-1", and "ap-southeast-1".

  • storageclass - Default is "STANDARD". If you have configured git-annex to preserve multiple copies, consider setting this to "REDUCED_REDUNDANCY" to save money.

  • host and port - Specify in order to use a different, S3 compatable service.

  • bucket - S3 requires that buckets have a globally unique name, so by default, a bucket name is chosen based on the remote name and UUID. This can be specified to pick a bucket name.

  • fileprefix - By default, git-annex places files in a tree rooted at the top of the S3 bucket. When this is set, it's prefixed to the filenames used. For example, you could set it to "foo/" in one special remote, and to "bar/" in another special remote, and both special remotes could then use the same bucket.

  • x-amz-* are passed through as http headers when storing keys in S3.

Just noting that the environment variables ANNEX_S3_ACCESS_KEY_ID and ANNEX_S3_SECRET_ACCESS_KEY seem to have been changed to AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
Comment by Matt Tue May 29 08:40:25 2012
Thanks, I've fixed that. (You could have too.. this is a wiki ;)
Comment by joeyh.name Tue May 29 15:10:46 2012
Thanks! Being new here, I didn't want to overstep my boundaries. I've gone ahead and made a small edit and will do so elsewhere as needed.
Comment by Matt Tue May 29 20:26:33 2012

it'd be really nice being able to configure a S3 remote of the form <bucket>/<folder> (not really a folder, of course, just the usual prefix trick used to simulate folders at S3). The remote = bucket architecture is not scalable at all, in terms of number of repositories.

how hard would it be to support this?

thanks, this is the only thing that's holding us back from using git-annex, nice tool!

Comment by Eduardo Thu Aug 9 06:52:07 2012
I guess this could be useful if you have a lot of buckets already in use at S3, or if you want to be able to have a lot of distinct S3 special remotes. Implemented the fileprefix setting. Note that I have not tested it, beyond checking it builds, since I let my S3 account expire. Your testing would be appreciated.
Comment by joeyh.name Thu Aug 9 14:01:06 2012

Any chance I could bribe you to setup Rackspace Cloud Files support? We are using them and would hate to have a S3 bucket only for this.

https://github.com/rackspace/python-cloudfiles

Comment by alan Thu Aug 23 17:00:11 2012
Joey, I'm curious to understand how future proof an S3 remote is. Can I restore my files without git-annex?
Comment by Eric Sun Jan 20 05:21:50 2013

If encryption is not used, the files are stored in S3 as-is, and can be accessed directly. They are stored in a hashed directory structure with the names of their key used, rather than the original filename. To get back to the original filename, a copy of the git repo would also be needed.

With encryption, you need the gpg key used in the encryption, or, for shared encryption, a symmetric key which is stored in the git repo.

See future proofing for non-S3 specific discussion of this topic.

Comment by joeyh.name Sun Jan 20 16:37:09 2013
Comments on this page are closed.