This is git-annex's bug list. Link bugs to done when done.
wishlist: more descriptive commit messages in git-annex branch
Posted Sat Sep 17 09:24:24 2011
--git-dir and --work-tree options
Posted Sat Sep 17 09:24:24 2011
Prevent accidental merges
Posted Sat Sep 17 09:24:24 2011
Cabal dependency monadIO missing
Posted Sat Sep 17 09:24:24 2011
git annex fsck is a no-op in bare repos
Posted Sat Sep 17 09:10:11 2011
making annex-merge try a fast-forward
Posted Sat Sep 17 09:10:11 2011
annexed symlink mtime matching code is disabled on non-linux systems; needs testing
Posted Sat Sep 17 09:10:11 2011
unannex and uninit do not work when git index is broken
Posted Sat Sep 17 09:10:11 2011
unannex command doesn't all files
Posted Sat Sep 17 09:10:11 2011
Unfortunate interaction with Calibre
Posted Sat Sep 17 09:10:11 2011
softlink mtime
Posted Sat Sep 17 09:10:11 2011
minor bug: errors are not verbose enough
Posted Sat Sep 17 09:10:11 2011
git annex unused seems to check for current path
Posted Sat Sep 17 09:10:11 2011
git rename detection on file move
Posted Sat Sep 17 09:10:11 2011
S3 memory leaks
Posted Sat Sep 17 09:10:11 2011
To re-inject new content for a file, you really want to get a new key for the file. Otherwise, other repos that have the old file will never get the new content. So:
a0826293 fixed the last problem, there is coreutils available in macports, if they are installed you get the gnu equivalents but they are prefixed with a g (e.g. gchmod instead of chmod), I guess not everyone will have these install or prefer these on OSX
Some more tests fail now...
On a side note, I think I found another bug in the testing. I had tested in a virtual machine in archlinux (a very recent updated version) Please see the report here tests fail when there is no global .gitconfig for the user
Ah, great, thanks very much for the quick fix!
Yes, when I mentioned three defunct git processes, there were three processes shown as "git [defunct]", plus the three git processes I listed, plus two "git-annex" processes. Upon cancel/resume, there were no defunct git processes when I checked, but by the time I found the bug report on the forum and commented I'd already successfully upgraded by annex (by repeatedly attaching strace) and couldn't really easily get at either additional 'ps' info or a fuller strace than what I posted (that was just the log from one of the attach/detach cycles), so it's a relief you managed to pinpoint the problem.
Or, even better, wouldn't it make sense to have SHA backends always default to --fast and only use non-fast when any snags are hit, use non-fast mode for that file.
Though if we continue here, we should probably move this to its own page.
Outside the test suite, git-annex's actual use of cp puts fairly low demands on it. It tries to use cp -a or cp -p if available just to preserve whatever attributes it can preserve, but the worst case if that you have a symlink pointing to a file that doesn't have the original timestamp or whatever. And there's little expectation git preserves that stuff anyway.
I will probably try to make the test suite entirely use git clone rather than cp.
Joey, sorry, I got it wrong. I thought upgrading git didn't help and you adjusted things in git-annex instead.
Anyway, can I get around upgrading on all hosts by reformatting the drive to case-sensitive HFS+? Or will I have to upgrade git (currently version 1.7.2.5) eventually anyway?
On second thought and after some messing (trying most of the options and combinations of options on OSX for).... I tried replacing cp with gnu cp from coreutils on my OSX install, and all the tests passed. sigh cp -a is preserving some permissions and attributes but not all, its not behaving in the same way as the gnu cp does... the closet thing that I have found on OSX that behaves in the same way as gnu "cp -pr" is to use "ditto".
Just doing a "ditto SOURCE DEST" in the tests passes everything. I'm not sure if its a good idea to use this even though it works. Though this is just the tests, does it affect CopyFile.hs where "cp" is called?
It seems the objects are in the remote after all, but the remote is unaware of this fact. No idea where/why the remote lost that info, but.. Anyway, with the SHA backends, wouldn't it make sense to simply return "OK" and update the annex logs accordingly, no?
Local:
Remote:
So, there is evidence here of a circumstance caused by the other bug, as I suspected.
I don't think that manual
git commit -a
caused the problem. I suspect it was a subsequentgit add
that caused git to follow the wrong case paths and add the files in the wrong place. Ie, when you run "git add .git-annex", it recurses into.git-annex/Gm/
, and adds files using that case, that were previously added from.git-annex/GM/
.For completeness, can you verify this repo's core.ignorecase setting?
I hate that you are stuck using loop filesystems to work around this bug. If my guess is correct, you don't need to, as long as you avoid manually running "git add .git-annex". I take this bug seriously. While I'm currently very involved in adding Amazon S3 support to git-annex (which will take days more of solid work), I do plan to make a loop filesystem of my own, probably vfat, so I can try and reproduce this on a case-insensative filesystem. If you could confirm my above hypothesis, that would speed things up for me.
It's possible I will have to tweak the hash directories. Hopefully if so, I will only tweak them for new keys; if I had to do a v3 backend just to fix this stupid thing, I'd be sad -- upgrading all my offline disks from v1 to v2 took me many days.
I forgot to mention that the statfs64 stuff in OSX seems to be deprecated, see http://developer.apple.com/library/mac/#documentation/Darwin/Reference/ManPages/man2/statfs64.2.html
on a slightly different note, is anonymous pushing to the "wiki" over git allowed? I'd prefer to be able to edit stuff inline for updating some of my own comments if I can :P
Try the changes I've pushed to use statfs64 on apple.
There is actually a standardized statvfs that I'd rather use, but after the last time that I tried going with the POSIX option first only to find it was not broadly implemented, I was happy to find some already existing code that worked for some OSs.
(While ikiwiki supports anonymous git push, it's a feature we have not rolled out on Branchable.com yet, and anyway, ikiwiki disallows editing existing comments that way. I would, however, be happy to git pull changes from somewhere.)
That's odd, I have the md5sha1sum package installed and it still fails with pretty much the same error
the configure script finds sha1sum, builds and starts to run.
I have pushed out a preliminary fix. The old mixed-case directories will be left where they are, and still read from by git-annex. New data will be written to new, lower-case directories. I think that once git stops seeing changes being made to mixed-case, colliding directories, the bugs you ran into won't manifest any more.
You will need to find a way to get your git repository out of the state where it complains about uncommitted files (and won't let you commit them). I have not found a reliable way to do that; git reset --hard worked in one case but not in another. May need to clone a fresh git repository.
Let me know how it works out.
What an evil little bug. In retrospect, this probably bit my own test upgrades, but I ran
git annex fsck
everywhere and so avoided the location log breakage.I've fixed the bug, which also involved files with other punctuation in their names [&:%] when using the WORM backend.
The only way I have to recover repos that have already been upgraded is to run
git annex fsck --fast
in each clone of such a repo, which will let it rebuild the location log information. I think that is the best way to recover; ie I can't think of a way to recover that doesn't need to do everything fsck does anyway.So, it appears that you're using git annex copy --fast. As documented that assumes the location log is correct. So it avoids directly checking if the bare repo contains the file, and tries to upload it, and the bare repo is all like "but I've already got this file!". The only way to improve that behavior might be to let rsync go ahead and retransfer the file, which, with recovery, should require sending little data etc. But I can't say I like the idea much, as the repo already has the content, so unlocking it and letting rsync mess with it is an unnecessary risk. I think it's ok for --force to blow up if its assumptions turn out to be wrong.
If you use git annex copy without --fast in this situation, it will do the right thing.
Version: 0.20110503
My local non-bare repo is copying to a remote bare repo.
I have been recovering in a non-bare repo.
If there is anything I can send you to help... If I removed said files and went through http://git-annex.branchable.com/bugs/No_easy_way_to_re-inject_a_file_into_an_annex/ -- would that help?
In the meantime, would it be acceptable to split the pre-commit hook into two discrete parts?
This would allow to (if preferred) defer "git annex fix" until post-commit while still keeping the safety net for unlocked files.
Alternatively, you can just load it up in ghci and see if it reports numbers that make sense:
Hi,
(I'm new to git and git annex, so please forgive any mistakes I make...)
My repo is messed up right now. The fact that I copied the repo with rsync -a back and forth from a case insensitive filesystem to a case sensitive one, probably didn't help.
I believe the annexed files in .git/annex/objects/ are still using a mixed case directory hashing scheme. That's the problem I'm having. The symlinks point to the wrong case and are now broken. I don't think the latest versions of git-annex changed that (it only changed the hashing under .git-annex, right?).
Even if I clean up my repo, I think I'm still going to have a problem because I have one repo on an OS X case insensitive filesystem and my other repos on case sensitive Linux filesystems. Potentially the directory name under .git/annex/objects will have a different case. Then the symlink might have a different case than my Linux FS. Does git-annex track changes in git by the contents of the symlink? In which case the case difference would show up as a change even though there is no change?
Is it possible to change the directory hashing scheme under .git/annex/objects to use lowercase names?
Seems like you probably have files in git with nearly as long filenames as the key files. Course, you can rename those yourself.
This couldn't be changed directly in WORM without some ugly transition, but it would be possible to implement it as a WORM100 or so. OTOH, if you're going to git annex migrate, you might as well use SHA1.
Hey @fmarier. Well, this bug report is closed because you can already get rid of the symlinks. Just put a bare git repo on your fat filesystem, and use git-annex copy --to/--from there.
Now, that puts all the files that are on the device in .git/annex/objects/xx/yy/blah.mp3 -- how well rockbox would support that I don't know. And if it tries to modify or delete those files, git annex also can't help you manage those changes.
Another recent option is the directory special remote type, which again uses "xx/yy/blah.mp3" and can't track changes made to the files. This could perhaps be extended in the direction you suggest, although trying to fit this into the special remote infrastructure might not be a good fit really.
The most likely way this has to get dealt with is really by using smudge filters, which would eliminate the symlinks and allow copying a non-bare git repo onto vfat.
Yeap, that did the trick. I just tested a few separate OSX 10.6.6 systems and the tests are better behaved now, only 3 failures now.
So the tests behave better (at least we don't get resource fork errors any more)
On all the systems I tested on, I'm down to 3 failures now.
It's the same set of failures across all the OSX systems that I have tested on. Now I just need to figure out why there are still these three failures.
It exists locally, whereis tells me it exists locally and locally, only.
The object is not in the bare repo.
The file might have gone missing before I upgraded my annex backend version to 2. Could this be a factor?
Hm, if path's ok, guess there's no way around git-bisect indeed. Wonder if there's some kind of ccache for haskell...
OS is linux, amd64 on "host1" and i386 on "host2" where git-annex-shell is crashing. I'll try to come up with a commit, thanks for clarifications.
Actually I may have just been stupid and should have read the man page on statfs...
yields this...
we could just stick another if defined (APPLE) instead of what I previously had and it looks like it will do the right thing on OSX.
Repeated bisect with -j1, just to be sure it's not a random error, and it gave me 828a84ba3341d4b7a84292d8b9002a8095dd2382 again. Guess I'll look through the changes there a bit later and try to revert these until it works.
Not sure if it's repeatable by anyone but me (and hence worth fixing), but here's a bit more of info about the system:
(some stuff listed here as ::installed, but contains no files, since these packages detect whether ghc-7.0.2 already comes with the same/newer package version)
I meant to say in it wasn't reliable when I was following the instructions for "Comment 12". I did find that just doing a "git annex copy -t externalusb ." then a "git annex drop ." from the root of my cloned and "none trusted" annexed repos to be more reliable, it just means I temporarily need a load of space to get myself out of my earlier mess.
On testing this bug fix, I found a minor behavioural issue with git annex copy -f REMOTE . doesn't work as expected
I also failed to mention, that in the case when i have stray log files after what has happened in comment 2, I get this left over after a commit when git is confused...
Up until now I have just been updating the status of the staged files by hand and commiting it on my mac x00, this probably isn't helping. I'd rather not lose the tracking information.
Currently fsck silently ignores --to/--from. It should at least complain if it is not supported.
Thanks to your feedback, I got it going.
Maybe those two should be added to the 'OSX how-to' in the forum
[realizes pcre-light is needed but pcre not installed on my mac]
sudo port install pcre
sudo cabal install pcre-light
[tests are failing, need haskell's quickcheck]
sudo cabal install quickcheck
I think I know how I got myself into this mess... I was on my mac workstation and I had just pulled in a change set from another repo on a linux workstation after I had a made a bunch of moves. here's a bit of a log of what happened...
If you try to clone a git repo that has a symlink over to a VFAT filesystem, you get (in its place) a regular file that contains the name of the symlink target. So why can't git-annex use that? I could still do git annex get on this file, git annex would still "know" that it's a symlink, and could replace it with a copy of the real file (instead of putting it in .git/annex).
I know if it were that simple, someone would have done it already, so what am I missing? I guess trying to get the file FROM the repository would fail because it wouldn't find the file in .git/annex? Couldn't you store a reverse mapping? You wouldn't be able to move the file around, but you already lose that once you give up symlinks. It would also be a little harder to tell which symlinks were "dangling"; I don't see an easy way to get around that. It would still be better than a bare repo..
Finally got around to report the issue to GHC tracker.
Looks quite alike (at least to the haskell-illiterate person like me) to a highest-priority issue that's hanging right at the top of the list. There are other similar reports, but they seem to be either related to PowerPC Macs, closed as invalid or due to needinfo inactivity.
Guess any further discussion belongs there, unless ghc developers will bounce it back. Thanks a lot for your help, Joey, and for sharing a great thing that git-annex is.
S3 doesn't support encryption at all, yet.
It certainly makes sense to use a different portion of the encrypted secret key for HMAC than is uses as the gpg symmetric encryption key.
The two keys used in HMAC would be the secret key and the key/value key for the content being stored.
There is a difficult problem with encrypting filenames in S3 buckets, and that is determining when some data in the bucket is unused for dropunused. I've considered two choices:
gpg encrypt the filenames. This would allow dropunused to recover the original filenames, and is probably more robust encryption. But it would double the number of times gpg is run when moving content in/out, and to check for unused content, gpg would have to be run once for every item in the bucket, which just feels way excessive, even though it would not be prompting for a passphrase. Still, haven't ruled this out.
HMAC or other hash. To determine what data was unused the same hash and secret key would have to be used to hash all filenames currently used, and then that set of hashes could be interested with the set in the bucket. But then git-annex could only say "here are some opaque hashes of content that appears unused by anything in your current git repository, but there's no way, short of downloading it and examining it to tell what it is". (This could be improved by keeping a local mapping between filenames and S3 keys, but maintaining and committing that would bring pain of its own.)
I also ran into problems on a case-insensitive HFS+ file system, it seems. I tried following the instructions in comment 12:
However, I still see upper and lower case directories in .git-annex. Did I misunderstand that they should all be lower case now?
You're missing the sha1sum command, everything else is a followon error from that. Added a hint about this to install, and in the next version configure will check for sha1sum.
Thanks for the reply @joey.
While it would certainly be possible for a bare repo to exist on my iRiver, the problem is that the music player uses the filesystem to organize files into directories like "Artist/Album/Track.ogg". So replacing that with "..../xx/yy/Track.ogg" would make it fairly difficult to browse my music collection and select the album/track I want to listen to :)
So unless I have the files physically organized like the symlinks, then it's probably not going to work very for that particular workflow. Smudge filters are interesting though. In the meantime, I'll look into rsyncing from another box which has the right filesystem layout onto my iRiver directly.
I've posted about this on the git mailing list. It's possible that these bugs, which can be shown to affect things other than just git-annex, will be fixed in git.
I will wait a while to see. But am considering making git-annex use all-lowercase hash dirs for the log files. Maybe it could first look for .git-annex/aaaa/bbbb/foo.log, but also look for, read, and merge in any info from .git-annex/Aa/Bb/foo.log. And always write to the new style filenames. This would avoid confusing git with changes to mixed-case files, and avoid another massive transition.
git annex fsck
or reset to the old git tree (andgit config annex.version 2
) and upgrade again.git-annex setkey
.What you're describing should be impossible; the error message shown can only occur if the object is present in the annex where
git-annex-shell recvkey
is run. So something strange is going on.Try reproducing it by running on the remote system,
git-annex-shell recvkey /remote/repo.git $key
.. if you can reproduce it, I guess the next thing to do will be to strace the command and see why it's thinking the object is there.I did not. Thanks :)
This still means that you can't re-inject a new version of a file unless you have the old one if you are using a SHA* backend, but that might be a corner case anyway.
I wouldn't say it's completly impossible for a WORM100 to work. It would just have the contract that the pair of mtime+100chars has to be unique for each unique piece of data.
But, I have yet to be convinced there's any point, since SHA1 exists.
The dtrace puzzlingly does not have the same errors shown above, but a set of mostly new errors. I don't know what to make of that.
This seems to be caused by it setting the execute bit on the file. I don't know why that would fail; it's just written the file and renamed it into place so clearly should be able to write to it.
This also suggests something breaking with permissions.
Hmm.. is utimensat available at all?
I've committed an update that may convince at least some compilers to expose this newer POSIX stuff. I don't know if it will help, please let me know.
You convince me for unannex, but isn't the goal of uninit to revert all annex operations? In the current state, a clean revert is not possible (because of the broken symlinks after uninit). Instead of copying, using hard links is out of question?
For my needs, is the command "git annex unlock ." (from the root of the repo) a correct workaround?
git annex whereis
say about it? Is the content actually present in annex/objects/ on the bare repository? Does that contradict whereis?Nice work on the bisection. It's obviously a compiler bug. Having two test cases that differ in only as trivial and innocous a commit as 828a84ba3341d4b7a84292d8b9002a8095dd2382 might help a GHC developer track it down.
We should probably forward this as a GHC bug. I hope you can find a different version or build of GHC to build git-annex with.
Ah, that gave me a good clue, my system just got pretty confused with a mixture of quickcheck and testpack installs. Would it be possible to put up a list of versions of the software you are using on your development environment? (at least the minimum tested version)
I guess it shouldn't matter to most users who are going to rely on packagers to sort these dependancy issues, but it's nice to know.
Anyway, the tests build now, and they seem to fail on my (rather messy) install of haskell platform + ghc 6.12 on osx 10.6.6.
I assumed that since the tests built, then running them shouldn't be a problem. It looks like some argument isn't being passed about for the location of the .t directory that gets created. I will check the dependancies on my system again.
if you go for the two-commits version, small intermediate branches (or git-commit-tree) could be used to create a tree like this:
while the first commit (436e46f) has a "
/subdir/foo → ../.git-annex/where_foo_is
", the intermediate (9395665) has "/subdir/deeper/foo → ../.git-annex/where_foo_is
", and the inal commit (106eef2) has "/subdir/deeper/foo → ../../.git-annex/where_foo_is
".--follow
uses the intermediate commit to find the history, but the intermediate commit would neither show up ingit log --first-parent
nor affectgit diff HEAD^..
& co. (there could still be confusion overgit show
, though).I'm not sure how this happened, as far as I can see, and based on my testing,
git annex upgrade
does stage the location log files. OTOH, I vaguely rememeber needing to stage some of them when I was doing my own upgrades, but that was a while ago, and I don't remember the details.Your upgrade seems to have gone ok from the file lists you sent, so you can just:
git add .git-annex; git commit
It all boils down to the fact that the path to a relative symlink's target is determined relative to the symlink itself.
Now, if we define the symlink's target relative to the git repo's root (eg. using the $GIT_DIR environment variable, which can be a relative or absolute path itself), this unfortunately results in an absolute symlink, which would -for obvious reasons- only be usable locally:
So, what we need is the ability to record the actual variable name (instead of it's value) in our symlinks.
It is possible, using variable/variant symlinks, yet I'm unsure as to whether or not this is available on Linux systems, and even if it is, it would introduce compatibility issues in multi-OS environments.
Thoughts on this?
Ok, well it looks like it isn't doing anything useful at all.
.git-annex/??
if you want to, then runninggit annex fsck --fast
in each of your clones would regenerate the data using only the lower-case hash directories.Yes, encrypting the symmetric key with users' regular gpg keys is the plan.
I don't think that encryption of content in a git annex remote makes much sense; the filenames obviously cannot be encrypted there. It's more likely that the same encryption would get used for a bup remote, or with the directory remote I threw in today.
ps
orstrace
to see what it's doing.As my comment from work is stuck in moderation:
I ran this twice:
but nothing changed
'git add .git-annex' didn't do anything. That's when I noticed that this repository is on a case-insensitive HFS+ file system.
So, if I get this right it's not a new bug, but similar to this situation: git-annex directory hashing problems on osx
Assuming that it was the file system's fault, I went ahead and upgraded yet another clone. That one (on an ext3 file system) had neither staged changes nor left-over untracked files. Everything seems to just have fallen right into place. Is that possible or still weird?
Hmm. Old versions may have forgotten to git add a .git-annex location log file when recovering content with fsck. That could be another reason things are out of sync.
But I'm not clear on which repo is trying to copy files to which.
(NB: If the files were recovered on a bare git repo, fsck cannot update the location log there, which could also explain this.)
I've seen this kind of piping stall that is unblocked by strace before. It can vary with versions of GHC, so it would be good to know what version built git-annex (and on what OS version). I filed a bug report upstream before at http://bugs.debian.org/624389.
I really need a full strace -f from the top, or at least a complete
strace -o log
of git-annex from one hang through to another hang. The strace you pastebinned does not seem complete. If I can work out which specific git command is being written to when it hangs I can lift the writing out into a separate thread or process to fix it.@pavel, you mentioned three defunct git processes, and then showed ps output for 3 git processes. Were there 6 git processes in total? And then when you ran it again you said there were no defunct gits -- where the other 3 git processes running once again?
As best I can make out from the (apparently) running git processes, it seems like the journal files for the upgrade had all been written, and the hang occurred when staging them all into the index in preparation for a commit. I have committed a change that lifts the code that does that write out into a new process, which, if I am guessing right on the limited info I have, will avoid the hang.
However, since I can't reproduce it, even when I put 200 thousand files in the journal and have git-annex process them, I can't be sure.
ok, pulling the latest master and building on OSX now does this...
changing the #if 0 to 1 gives this...
it seems that commit 6634b6a6b84a924f6f6059b5bea61f449d056eee has broken support for OSX.
Just did some minor digging around and checking, this seems to satisfy the compilers etc... I have yet to confirm that it really is working as expected. Also it might be better to check for a darwin operating system instead of apple I think, though I don't know of any one really using a pure darwin OS. But for now it works (I think)
Completed git-bisect twice, getting roughly the same results:
contents of final refs/bisect:
"roughly" because second bisect gave two commits as a result, failing to build one of them (missing .o file on link, guess it's because of -j4 and bad deps in that version's build system):
Also noticed that "git-annex-shell ..." command succeeds if ran as root user, while failing from unprivileged one. There are no permission/access errors in "strace -f git-annex-shell ...", so I guess it could be some bug in the GHC indeed.
JIC, logged a whole second bisect operation. Resulting log: http://fraggod.net/static/share/git-annex-bisect.log
Bisect script I've used (git-annex-shell dies with error code 134 - SIGABRT on GHC error):
I think the correct steps should be, make a backup first :) then ...
I eventually migrated all of my own annex'd repos and I no longer have the old hashed directories but the new ones in the form
I did lose some tracking information but not data (as far as I can see for now), but that was quickly fixed by pushing and pulling to my bare repo which tracks most of my data.
I also found that it worked a bit more reliably for me on the copies of repos that were located on case sensitive filesystems, but I guess that was expected.
git 1.7.4 does not make things better. With it, if I add first "X/foo" and then "x/bar", it commits "X/bar".
That will certianly cause problems when interoperating with a repo clone on a case-sensative filesystem, since git-annex there will not see the location log that git committed to the wrong case directory.
It's possible there is some interoperability problem when pulling from linux like you did, onto HFS+, too. I am not quite sure. Ah, I did find one.. if I clone the repo with "X/foo" in it to a case-sensative filesystem, and add a "x/foo" there, and pull that commit back to HFS+, git says:
Aha -- that lets me reproduce your problem with the same file being staged twice with different capitalizations, too:
And modified files that git refuses to commit, which entirely explains git-annex has issues with git when staging/commiting logs.
I think git is frankly, buggy. It seems I will need to work around this by stopping using mixed case hashing for location logs.
I think I have figured out why
It goes back to the this piece of code (in test.hs)
It seems that on OSX it does not preserve the symbolic link information, basically cp is not gnu cp on OSX, doing a "cp -a SOURCE DEST" seem's to the right thing on OSX. I tried it out on my archlinux workstation by replacing -pr with just -a and all the tests passed on archlinux.
I'm not sure what the implications would be with changing the test with changing the cp command.
Haven't given these any serious thought (which will become apparent in a moment) but hoping they will give birth to some less retarded ideas:
Bait'n'switch
In doing so, the blobs to be committed can remain unaltered, irrespective of their related files' depth in the directory hierarchy.
To prevent git from reporting ALL annexed files as unstaged changes after running post-commit hook, their paths would need to be added to .gitignore.
This wouldn't cause any issues when adding files, very little when modifying files (would need some alterations to "git annex unlock"), BUT would make git totally oblivious to removals...
Manifest-based (re)population
... thus circumventing the issue entirely, yet diffstats (et al.) would be rather uninformative.
Wide open to suggestions, criticism, mocking laughter and finger-pointing :)
I doubt that git-annex can be used with QuickCheck 1.2.0. The QuickCheck I've tested it with is 2.1.0.3 actually.
I suspect you have an old version of the TestPack haskell library on your system, that is linked against QuickCheck 1.2.0. Git-annex has been tested with TestPack 2.0.0, which uses QuickCheck 2.x.
In any case, you don't have to run 'make test' to build git-annex, and my comments above should make the main program compile, I expect.
After mulling this over, I think actually encrypting the filenames is preferable.
Did you consider encrypting the symmetric key with an asymmetric one? That's what TrueCrypt etc are using to allow different people access to a shared volume. This has the added benefit that you could, potentially, add new keys for data that new people should have access to while making access to old data impossible. Or keys per subdirectory, or, or, or.
As an aside, could the same mechanism be extended to transparently encrypt data for a remote annex repo? A friend of mine is interested to host his data with me, but he wants to encrypt his data for obvious reasons.
I'm using git-annex to keep my music in sync between all of my different machines. What I'd love to be able to do is to also keep it in sync with my iRiver player. Unfortunately, the firmware, Rockbox, doesn't support ext3, so I'm stuck with a FAT filesystem.
I can see how the design of git-annex makes it rather difficult to get rid of the symlinks, so how about taking a different approach: something like a "git annex export DEST" which would take a destination (not a git remote) and rsync the content over to there as regular files.
Maybe "git annex sync DEST" or "git annex rsync DEST" would be better names if we want to convey the idea that the destination will be made to look like the source repo, including performing the necessary deletions.
I followed this to re-inject files which git annex fsck listed as missing.
For everyone of those files, I get
when trying to copy the files to the remote.
-- Richard
It may be possible that OSX has some low resource limits, for user processes (266 per user I think) doing a
seems to change the behaviour of the tests abit...
the number of failures vary as I change the values of the maxprocs, I think I have narrowed it down to OSX just being stupid with limits thus causing the tests to fail.
When I reproduce this, the file is not gone, it's been moved under .git/annex/objects. There is no way an add can delete a file, since all it does is rename it. It would be good for it to error unwind and move the file back though.
Alright, I have created a case-insensative HFS+ filesystem here on my linux laptop.
I have not been able to trick git into staging the same file with 2 different capitalizations yet.
It might be helpful if you can send me a copy of a git repository where 'git add -i' shows the same file staged with two capitalizations. Leaving out .git/annex of course. (joey@kitenet.net; a tarball would probably work)
It seems that
git add
only started properly working on case insensative filesystems quite recently. The commit in question is 5e738ae820ec53c45895b029baa3a1f63e654b1b, "Support case folding for git add when core.ignorecase=true", which was first released in git 1.7.4, January 30, 2011. If you don't yet have that version, that could explain the problem entirely. In about half an hour (dialup!) I will have downloaded an older git and will see if I can reproduce the problem with it.I'm running ghc 6.12.3 with the corresponding haskell-platform package from the HP site which I installed in preference to the macports version of haskell-platform (it's quite old). it seems when you install quickcheck, the version that is installed is of version 2.4.0.1 and not 1.2.0 which git-annex depends on for its tests.
it fails with this
I'd imagine if I could downgrade, it would compile and pass the tests (I hope)
git annex unlock; modify; git-annex lock
If you install the monads-fd package (with cabal install for instance), then you can no longer build git-annex:
I'm leaving this bug open because this feature, however minor is not available on OSX and BSD.
I have added a partial implementation using lutimes(3), which should be available on the BSDs. However, it's ifdefed out due to a casting problem: The TimeSpec uses a CTime, while lutimes uses a CLong. These data types may be internally the same on some or all platforms, so if you want this feature you can try changing the "ifdef 0" in Touch.hsc to 1 and try it, see if "git annex add" mirrors file modification time in created symlinks, and let me know.
@seqq git-annex always uses the same case when creating and accessing the files pointed to by the symlinks. So it will not matter if it's used on a case-insensative, or case-insensative but preserving system like OSX.
You need to fix up the cases of the files in .git/annex/objects to what it expects. I'm not sure what would be the best way to do that. The method described in recover data from lost+found might work well.
Keep in mind that lots of small files may have significant overhead, so a warning that it's not possible to make sure there's enough space would make sense for certain corner cases. Actually finding out the exact overhead is beyond git-annex' scope and, given transparent compression etc, ability, but a warning, optionally with a "do you want to continue" prompt can't hurt.
-- RichiH
Yes, makes sense. I am so used to using --fast, I forgot a non-fast mode existed. I still think it would be a good idea to fall back to non-fast mode if --fast runs into an error from the remote, but as that is well without my abilities how about this patch?
I've also seen this apparent hang during upgrade to v3. A few more details:
The annex in question has just under 18k files (and hence that many log files), which can slow down directory operations when they're all in the same place (like, for example, .git/annex/journal).
git-annex uses virtually no CPU time and disk IO when it's hanging like this; the first time it happened, 'ps' showed three defunct git processes, with two "git-annex" processes and three "git" procs:
I Ctrl+C'd that and tried again, but it hung again -- this time without the defunct gits.
An strace of the process and its children at the time of hang can be found at http://pastebin.com/4kNh4zEJ . It showed somewhat weird behaviour: When I attached with strace, it would scroll through a whole bunch of syscalls making up the open-fstat-read-close-write loop on .git/annex/journal files, but then would block on a write (sorry, don't have that in my scrollback any more so can't give more details) until I Ctrl+C'd strace; when attaching again, it would again scroll through the syscalls for a second or so and then hang with no output.
Ultimately I detached/reattached with strace about two dozen times and that caused it (?) to finish the upgrade; not really sure how to explain it, but it seems like too much of a timing coincidence.
I use Debian Squeeze, I have the Debian package cabal-install 0.8.0-1 installed.
This installed: Cabal-1.10.2.0, zlib-0.5.3.1, cabal-install 0.10.2. No version of monad-control or monadIO installed.
After I added a depencency for monadIO to the git-annex.cabal file, it installed correctly.
-- Thomas