git-annex's high-level design is mostly inherent in the data that it stores in git, and alongside git. See internals for details.

See encryption for design of encryption elements.

Assuming you're storing your encrypted annex with me and I with you, our regular cron jobs to verify all data will catch corruption in each other's annexes.

Checksums of the encrypted objects could be optional, mitigating any potential attack scenarios.

It's not only about the cost of setting up new remotes. It would also be a way to keep data in one annex while making it accessible only in a subset of them. For example, I might need some private letters at work, but I don't want my work machine to be able to access them all.

Comment by Richard Tue Apr 5 19:24:17 2011

@Richard the easy way to deal with that scenario is to set up a remote that work can access, and only put in it files work should be able to see. Needing to specify which key a file should be encrypted to when putting it in a remote that supported multiple keys would add another level of complexity which that avoids.

Of course, the right approach is probably to have a separate repository for work. If you don't trust it with seeing file contents, you probably also don't trust it with the contents of your git repository.

Comment by joey Thu Apr 7 15:59:30 2011

New encryption keys could be used for different directories/files/patterns/times/whatever. One could then encrypt this new key for the public keys of other people/machines and push them out along with the actual data. This would allow some level of access restriction or future revocation. git-annex would need to keep track of which files can be decrypted with which keys. I am undecided if that information needs to be encrypted or not.

Encrypted object files should be checksummed in encrypted form so that it's possible to verify integrity without knowing any keys. Same goes for encrypted keys, etc.

Chunking files in this context seems like needless overkill. This might make sense to store a DVD image on CDs or similar, at some point. But not for encryption, imo. Coming up with sane chunk sizes for all use cases is literally impossible and as you pointed out, correlation by the remote admin is trivial.

Comment by Richard Sun Apr 3 16:03:14 2011

I see no use case for verifying encrypted object files w/o access to the encryption key. And possible use cases for not allowing anyone to verify your data.

If there are to be multiple encryption keys usable within a single encrypted remote, than they would need to be given some kind of name (a since symmetric key is used, there is no pubkey to provide a name), and the name encoded in the files stored in the remote. While certainly doable I'm not sold that adding a layer of indirection is worthwhile. It only seems it would be worthwhile if setting up a new encrypted remote was expensive to do. Perhaps that could be the case for some type of remote other than S3 buckets.

Comment by joey Tue Apr 5 14:41:49 2011
Comments on this page are closed.