On several of my repos, the upgrade to v3 seemed to take forever. A Crl-C followed by another "git annex upgrade" "solved" the problem in some cases. Sometimes, I had to also delete the .git/annex/journal dir to have the upgrade. I didn't notice anything special about the non-working repos to help diagnose the problem.

Well if it happens again why don't you use ps or strace to see what it's doing.
Comment by joey Mon Jul 4 18:58:46 2011

I've also seen this apparent hang during upgrade to v3. A few more details:

The annex in question has just under 18k files (and hence that many log files), which can slow down directory operations when they're all in the same place (like, for example, .git/annex/journal).

git-annex uses virtually no CPU time and disk IO when it's hanging like this; the first time it happened, 'ps' showed three defunct git processes, with two "git-annex" processes and three "git" procs:

  • git --git-dir=/mnt/annex/.git --work-tree=/mnt/annex cat-file --batch
  • git --git-dir=/mnt/annex/.git --work-tree=/mnt/annex hash-object -w --stdin-paths
  • git --git-dir=/mnt/annex/.git --work-tree=/mnt/annex update-index -z --index-info

I Ctrl+C'd that and tried again, but it hung again -- this time without the defunct gits.

An strace of the process and its children at the time of hang can be found at http://pastebin.com/4kNh4zEJ . It showed somewhat weird behaviour: When I attached with strace, it would scroll through a whole bunch of syscalls making up the open-fstat-read-close-write loop on .git/annex/journal files, but then would block on a write (sorry, don't have that in my scrollback any more so can't give more details) until I Ctrl+C'd strace; when attaching again, it would again scroll through the syscalls for a second or so and then hang with no output.

Ultimately I detached/reattached with strace about two dozen times and that caused it (?) to finish the upgrade; not really sure how to explain it, but it seems like too much of a timing coincidence.

Comment by pavel Tue Jul 5 11:54:19 2011

I've seen this kind of piping stall that is unblocked by strace before. It can vary with versions of GHC, so it would be good to know what version built git-annex (and on what OS version). I filed a bug report upstream before at http://bugs.debian.org/624389.

I really need a full strace -f from the top, or at least a complete strace -o log of git-annex from one hang through to another hang. The strace you pastebinned does not seem complete. If I can work out which specific git command is being written to when it hangs I can lift the writing out into a separate thread or process to fix it.

@pavel, you mentioned three defunct git processes, and then showed ps output for 3 git processes. Were there 6 git processes in total? And then when you ran it again you said there were no defunct gits -- where the other 3 git processes running once again?

As best I can make out from the (apparently) running git processes, it seems like the journal files for the upgrade had all been written, and the hang occurred when staging them all into the index in preparation for a commit. I have committed a change that lifts the code that does that write out into a new process, which, if I am guessing right on the limited info I have, will avoid the hang.

However, since I can't reproduce it, even when I put 200 thousand files in the journal and have git-annex process them, I can't be sure.

Comment by joey Tue Jul 5 13:31:22 2011
I've managed to reproduce this and confirmed my fix works.
Comment by joey Tue Jul 5 14:37:21 2011
By the way, the original bug reporter mentioned deleting .git/annex/journal. This is not recommended, and doing it during an upgrade can result in git-annex losing location tracking information. You should probably run git annex fsck or reset to the old git tree (and git config annex.version 2) and upgrade again.
Comment by joey Tue Jul 5 15:06:48 2011

Ah, great, thanks very much for the quick fix!

Yes, when I mentioned three defunct git processes, there were three processes shown as "git [defunct]", plus the three git processes I listed, plus two "git-annex" processes. Upon cancel/resume, there were no defunct git processes when I checked, but by the time I found the bug report on the forum and commented I'd already successfully upgraded by annex (by repeatedly attaching strace) and couldn't really easily get at either additional 'ps' info or a fuller strace than what I posted (that was just the log from one of the attach/detach cycles), so it's a relief you managed to pinpoint the problem.

Comment by pavel Wed Jul 6 04:14:26 2011
Comments on this page are closed.