tahoe lfs for reals

tips: special_remotes/hook with tahoe-lafs is a good start, but Zooko points out that using Tahoe's directory translation layer incurs O(N^2) overhead as the number of objects grows. Also, making hash subdirectories in Tahoe is expensive. Instead it would be good to use it as a key/value store directly. The catch is that doing so involves sending the content to Tahoe, and getting back a key identifier.

This would be fairly easy to do as a backend, which can assign its own key names (although typically done before data is stored in it), but a tahoe-lafs special remote would be more flexible.

To support a special remote, a mapping is needed from git-annex keys to Tahoe keys.

The best place to store this mapping is perhaps as a new field in the location log:

date present repo-uuid newfields

This way, each remote can store its own key-specfic data in the same place as other key-specific data, with minimal overhead.

performance

Hm... O(N^2)? I think it just takes O(N). To read an entry out of a directory you have to download the entire directory (and store it in RAM and parse it). The constants are basically "too big to be good but not big enough to be prohibitive", I think. jctang has reported that his special remote hook performs well enough to use, but it would be nice if it were faster.

The Tahoe-LAFS folks are working on speeding up mutable files, by the way, after which we would be able to speed up directories.

Comment by zooko — Tue May 17 15:20:39 2011

comment 2

Whoops! You'd only told me O(N) twice before..

So this is not too high priority. I think I would like to get the per-remote storage sorted out anyway, since probably it will be the thing needed to convert the URL backend into a special remote, which would then allow ripping out the otherwise unused pluggable backend infrastructure.

Update: Per-remote storage is now sorted out, so this could be implemented if it actually made sense to do so.

Comment by joey — Tue May 17 15:57:33 2011

Comments on this page are closed.