Hi, a few years ago I wrote a tool called 'ddm'. The code is overengineered and the script is more complicated then it should be, but I think it demonstrates some good use cases, and I wonder how well git-annex can fulfill the requirements for those use cases - maybe I should remove ddm and start hacking with git-annex instead.

To answer this question, you should read the section about the possible dataset types on http://dieter.plaetinck.be/ddm_a_distributed_data_manager.html, and the example at the bottom of that page. it demonstrates the idea behind the "selection" dataset to always try to keep a subset (the most appropriate, based on the output of some script) of files "checked out". the introduction section on https://github.com/Dieterbe/ddm/raw/358f7cf92c0ba7b336dc97638351d4e324461afa/MANUAL should further clarify things, as well as give some more good use cases (as you can see it's a bit more about [semi-]automated workflows then purely tracking what's where)

So I'm not sure, maybe the way to go for me is to make git-annex my "housekeeping about which data is where" backend and make ddm into a set of policies and tools on top of git-annex.

Any input?

Thanks, Dieter

Yes, there is value in layering something over git-annex to use a policy to choose what goes where.

I use mr to update and manage all my repositories, and since mr can be made to run arbitrary commands when doing eg, an update, I use its config file as such a policy layer. For example, my podcasts are pulled into my sound repository in a subdirectory; boxes that consume podcasts run "git pull; git annex get podcasts --exclude="/out/"; git annex drop podcasts/*/out". I move podcasts to "out" directories once done with them (I have yet to teach mpd to do that for me..), and the next time I run "mr update" to update everything, it pulls down new ones and removes old ones.

I don't see any obstacle to doing what you want. May be that you'd need better querying facilities in git-annex (so the policy layer can know what is available where), or finer control (--exclude is a good enough hammer for me, but maybe not for you).

Comment by joey Mon Feb 14 18:08:54 2011

thanks Joey,

is it possible to run some git annex command that tells me, for a specific directory, which files are available in an other remote? (and which remote, and which filenames?) I guess I could run that, do my own policy thingie, and run git annex get for the files I want.

For your podcast use case (and some of my use cases) don't you think git [annex] might actually be overkill? For example your podcasts use case, what value does git annex give over a simple rsync/rm script? such a script wouldn't even need a data store to store its state, unlike git. it seems simpler and cleaner to me.

for the mpd thing, check http://alip.github.com/mpdcron/ (bad project name, it's a plugin based "event handler") you should be able to write a simple plugin for mpdcron that does what you want (or even interface with mpd yourself from perl/python/.. to use its idle mode to get events)

Dieter

Comment by dieter Wed Feb 16 17:32:04 2011

Whups, the comment above got stuck in moderation queue for 27 days. I will try to check that more frequently.

In the meantime, I've implemented "git annex whereis" -- enjoy!

I find keeping my podcasts in the annex useful because it allows me to download individual episodes or poscasts easily when low bandwidth is available (ie, dialup), or over sneakernet. And generally keeps everything organised.

Comment by joey Tue Mar 15 23:01:17 2011
Comments on this page are closed.