Several things suggest now would be a good time to reorgaize the object directory. This would be annex.version=2. It will be slightly painful for all users, so this should be the last reorg in the forseeable future.

  1. Remove colons from filenames, for fat support

  2. Add hashing, since some filesystems do suck (like er, fat at least :) hashing objects directories (Also, may as well hash .git-annex/* while at it -- that's what really gets big.)

  3. Add filesize metadata for free space checking. (Currently only present in WORM, and in an ad-hoc way.)

  4. Perhaps use a generic format that will allow further metadata to be added later. For example, "bSHA1,s101111,kf3101c30bb23467deaec5d78c6daa71d395d1879"

    (Probably everything after ",k" should be part of the key, even if it contains the "," separator character. Otherwise an escaping mechanism would be needed.)

done now!

Although free space checking is not quite there --Joey

What is the potential time-frame for this change? As I am not using git-annex for production yet, I can see myself waiting to avoid any potential hassle.

Supporting generic metadata seems like a great idea. Though if you are going this path, wouldn't it make sense to avoid metastore for mtime etc and support this natively without outside dependencies?

-- RichiH

Comment by Richard Tue Mar 15 10:08:41 2011
If you support generic meta-data, keep in mind that you will need to do conflict resolution. Timestamps may not be synched across all systems, so keeping a log of old metadata could be used, sorting by history and using the latest. Which leaves the situation of two incompatible changes. This would probably mean manual conflict resolution. You will probably have thought of this already, but I still wanted to make sure this is recorded. -- RichiH
Comment by Richard Tue Mar 15 21:16:48 2011
Hmm, I added quite a few comments at work, but they are stuck in moderation. Maybe I forgot to log in before adding them. I am surprised this one appeared immediately. -- RichiH
Comment by Richard Tue Mar 15 21:19:25 2011

Well, I spent a few hours playing this evening in the 'reorg' branch in git. It seems to be shaping up pretty well; type-based refactoring in haskell makes these kind of big systematic changes a matter of editing until it compiles. And it compiles and test suite passes. But, so far I've only covered 1. 3. and 4. on the list, and have yet to deal with upgrades.

I'd recommend you not wait before using git-annex. I am committed to provide upgradability between annexes created with all versions of git-annex, going forward. This is important because we can have offline archival drives that sit unused for years. Git-annex will upgrade a repository to current standard the first time it sees it, and I hope the upgrade will be pretty smooth. It was not bad for the annex.version 0 to 1 upgrade earlier. The only annoyance with upgrades is that it will result in some big commits to git, as every symlink in the repo gets changed, and log files get moved to new names.

(The metadata being stored with keys is data that a particular backend can use, and is static to a given key, so there are no merge issues (and it won't be used to preserve mtimes, etc).)

Comment by joey Tue Mar 15 23:22:45 2011

Hashing & segmenting seems to be around the corner, which is nice :)

Is there a chance that you will optionally add mtime to your native metadata store? If yes, I'd rather wait for v2 to start with the native system from the start. If not, I will probably set it up tonight.

PS: While posting from work, my comments are held for moderation once again. I am somewhat confused as to why this happens when I can just submit directly from home. And yes, I am using the same auth provider and user in both cases.

Comment by Richard Wed Mar 16 11:51:30 2011

The mtime cannot be stored for all keys. Consider a SHA1 key. The mtime is irrelevant; 2 files with different mtimes, when added to the SHA1 backend, should get the same key.

Probably our spam filter doesn't like your work IP.

Comment by joey Wed Mar 16 12:32:52 2011

Ah, OK. I assumed the metadata would be attached to a key, not part of the key. This seems to make upgrades/extensions down the line harder than they need to be, but you are right that this way, merges are not, and never will be, an issue.

Though with the SHA1 backend, changing files can be tracked. This means that tracking changes in mtime or other is possible. It also means that there are potential merge issues. But I won't argue the point endlessly. I can accept design decisions :)

The prefix at work is from a university netblock so yes, it might be on a few hundred proxy lists etc.

Comment by Richard Wed Mar 16 17:05:38 2011
Comments on this page are closed.