One of my reasons for using haskell was that it provides the possibility of some parallell processing. Although since git-annex hits the filesystem heavily and mostly runs other git commands, maybe not a whole lot.

Anyway, each git-annex command is broken down into a series of independant actions, which has some potential for parallelism.

Each action has 3 distinct phases, basically "check", "perform", and "cleanup". The perform actions are probably parellizable; the cleanup may be (but not if it has to run git commands to stage state; it can queue commands though); the check should be easily parallelizable, although they may access the disk or run minor git query commands, so would probably not want to run too many of them at once.

I also think, that fetching keys via rsync can be done by one rsync process, when the keys are fetched from one host. This would avoid establishing a new TCP connection for every file.
Comment by Christian Fri Apr 8 08:41:43 2011

I agree with Christian.

One should first make a better use of connections to remotes before exploring parallel possibilities. One should pipeline the requests and answers.

Of course this could be implemented using parallel&concurrency features of Haskell to do this.

Comment by npouillard Fri May 20 16:14:15 2011
Comments on this page are closed.