I'm soliciting ideas for new small features that let git-annex do things that currently have to be done manually or whatever.

Here are a few I've been considering:


  • --numcopies would be a useful command line switch.

    Update: Added. Also allows for things like git annex drop --numcopies=2 when in a repo that normally needs 3 copies, if you need to urgently free up space.

  • A way to make drop and other commands temporarily trust a given remote, or possibly all remotes.
Combined, this would allow git annex drop --numcopies=2 --trust=repoa --trust=repob to remove files that have been replicated out to the other 2 repositories, which could be offline. (Slightly unsafe, but in this case the files are podcasts so not really.)

Update: done --Joey


wishlist: git-annex replicate suggests some way for git-annex to have the smarts to copy content around on its own to ensure numcopies is satisfied. I'd be satisfied with a git annex copy --to foo --if-needed-by-numcopies

Contrary to the "basic" solution, I would love to have a git annex distribute which is smart enough to simply distribute all data according to certain rules. My ideal, personal use case during the next holidays where I will have two external disks, several SD cards with 32 GB each and a local disk with 20 GB (yes....) would be:

cd ~/photos.annex # this repository does not have any objects!
git annex inject --bare /path/to/SD/card  # this adds softlinks, but does **not** add anything to the index. it would calculate checksums (if enabled) and have to add a temporary location list, though
git annex distribute # this checks the config. it would see that my two external disks have a low cost whereas the two remotes have a higher cost.
 # check numcopies. it's 3
 # copy to external disk one (cost x)
 # copy to external disk two (cost x)
 # copy to remote one (cost x * 2)
 # remove file from temporary tracking list
git annex fsck # everything ok. yay!

Come to think of it, the inject --bare thing is probably not a microfeature. Should I add a new wishlist item for that? -- RichiH

I've thought about such things before; does not seem really micro and I'm unsure how well it would work, but it would be worth a todo. --Joey


Along similar lines, it might be nice to have a mode where git-annex tries to fill up a disk up to the annex.diskreserve with files, preferring files that have relatively few copies. Then as storage prices continue to fall, new large drives could just be plopped in and git-annex used to fill it up in a way that improves the overall redundancy without needing to manually pick and choose.


If a remote could send on received files to another remote, I could use my own local bandwith efficiently while still having my git-annex repos replicate data. -- RichiH


Really micro:

% grep annex-push .git/config
    annex-push = !git pull && git annex add . && git annex copy . --to origin --fast --quiet && git commit -a -m "$HOST $(date +%F--%H-%M-%S-%Z)" && git push
%

-- RichiH --Joey

I've been longing for an automated way of removing references to a remote assuming I know the exact uuid that I want to remove. i.e. I have lost a portable HDD due to a destructive process, I now want to delete all references to copies of data that was on that disk. Unless this feature exists, I would love to see it implemented.
Comment by Jimmy Wed Jun 1 13:36:50 2011
@jimmy what to do when you lose a repository.. I have not seen a convincing argument that removing the location tracking data entirely serves any purpose
Comment by joey Wed Jun 1 16:24:33 2011

This was already asked here, but I have a use case where I need to unlock with the files being hardlinked instead of copied (my fs does not support CoW), even though 'git annex lock' is now much faster ;-) . The idea is that 1) I want the external world see my repo "as if" it wasn't annexed (because of its own limitation to deal with soft links), and 2) I know what I do, and am sure that files won't be written to but only read.

My case is: the repo contains a snapshot A1 of a certain remote directory. Later I want to rsync this dir into a new snapshot A2. Of course, I want to transfer only new or changed files, with the --copy-dest=A1 (or --compare-dest) rsync's options. Unfortunately, rsync won't recognize soft-links from git-annex, and will re-transfer everything.

Maybe I'm overusing git-annex ;-) but still, I find it is a legitimate use case, and even though there are workarounds (I don't even remember what I had to do), it would be much more straightforward to have 'git annex unlock --readonly' (or '--readonly-unsafe'?), ... or have rsync take soft-links into account, but I did not see the author ask for microfeatures ideas :) (it was discussed, and only some convoluted workarounds were proposed). Thanks.

Comment by Rafaël Thu Jun 2 07:34:42 2011

Before dropping unsused items, sometimes I want to check the content of the files manually. But currently, from e.g. a sha1 key, I don't know how to find the corresponding file, except with 'find .git/annex/objects -type f -name 'SHA1-s1678--70....', wich is too slow (I'm in the case where "git log --stat -S'KEY'" won't work, either because it is too slow or it was never commited). By the way, is it documented somewhere how to determine the 2 (nested) sub-directories in which a given (by name) object is located?

So I would like 'git-annex unused' be able to give me the list of paths to the unused items. Also, I would really appreciate a command like 'git annex unused --log NUMBER [NUMBER2...]' which would do for me the suggested command "git log --stat -S'KEY'", where NUMBER is from the 'git annex unused' output. Thanks.

Comment by Rafaël Thu Jun 2 07:55:58 2011
ps: concerning the command 'find .git/annex/objects -type f -name 'SHA1-s1678--70....' from my previous comment, it is "significantly" faster to search for the containing directory which have the same name: 'find .git/annex/objects -maxdepth 2 -mindepth 2 -type d -name 'SHA1-s1678--70....'. I am just curious: what is the need to have each file object in its own directory, itself nested under two more sub-directories?
Comment by Rafaël Thu Jun 2 15:51:49 2011
I'm not sure it is worth adding a command for such a small feature, but I would certainly use it: having something like "git annex fetch remote" do "git fetch remote && git annex copy --from=remote", and "git annex push remote" do "git push remote && git annex copy --to=remote". And maybe the same for a pull operation?
Comment by Rafaël Sun Jul 3 10:39:41 2011

My last comment is a bit confused. The "git fetch" command allows to get all the information from a remote, and it is then possible to merge while being offline (without access to the remote). I would like a "git annex fetch remote" command to be able to get all annexed files from remote, so that if I later merge with remote, all annexed files are already here. And "git annex fetch" could (optionally) call "git fetch" before getting the files.

It seems also that in my last post, I should have written "git annex get --from=remote" instead of "git annex copy --from=remote", because "annex copy --from" copies all files, even if the local repo already have them (is this the case? if yes, when is it useful?)

Comment by Rafaël Sun Jul 3 13:57:00 2011
Comments on this page are closed.