git rename detection on file move

It's unfortunate that git-annex sorta defeats git's rename detection.

When an annexed file is moved to a different directory (specifically, a directory that is shallower or deeper than the old directory), the symlink often has to change. And so git log cannot --follow back through the rename history, since all it has to go on is that symlink, which it effectively sees as a one line file containing the symlink target.

One way to fix this might be to do the git annex fix after the rename is committed. This would mean that a commit would result in new staged changes for another commit, which is perhaps startling behavior.

The other way to fix it is to stop using symlinks, see smudge.

use mini-branches

if you go for the two-commits version, small intermediate branches (or git-commit-tree) could be used to create a tree like this:

*   commit 106eef2
|\  Merge: 436e46f 9395665
| | 
| |     the main commit
| |   
| * commit 9395665
|/  
|       intermediate move
|  
* commit 436e46f
| 
|     ...

while the first commit (436e46f) has a "/subdir/foo → ../.git-annex/where_foo_is", the intermediate (9395665) has "/subdir/deeper/foo → ../.git-annex/where_foo_is", and the inal commit (106eef2) has "/subdir/deeper/foo → ../../.git-annex/where_foo_is".

--follow uses the intermediate commit to find the history, but the intermediate commit would neither show up in git log --first-parent nor affect git diff HEAD^.. & co. (there could still be confusion over git show, though).

Comment by chrysn — Wed Mar 9 19:47:48 2011

Use variable symlinks, relative to the repo's root ?

It all boils down to the fact that the path to a relative symlink's target is determined relative to the symlink itself.

Now, if we define the symlink's target relative to the git repo's root (eg. using the $GIT_DIR environment variable, which can be a relative or absolute path itself), this unfortunately results in an absolute symlink, which would -for obvious reasons- only be usable locally:

user@host:~$ mkdir -p tmp/{.git/annex,somefolder}
user@host:~$ export GIT_DIR=~/tmp
user@host:~$ touch $GIT_DIR/.git/annex/realfile
user@host:~$ ln -s $GIT_DIR/.git/annex/realfile $GIT_DIR/somefolder/file
user@host:~$ ls -al $GIT_DIR/somefolder/
total 12
drwxr-x--- 2 user group 4096 2011-03-10 16:54 .
drwxr-x--- 4 user group 4096 2011-03-10 16:53 ..
lrwxrwxrwx 1 user group   33 2011-03-10 16:54 file -> /home/user/tmp/.git/annex/realfile
user@host:~$

So, what we need is the ability to record the actual variable name (instead of it's value) in our symlinks.

It is possible, using variable/variant symlinks, yet I'm unsure as to whether or not this is available on Linux systems, and even if it is, it would introduce compatibility issues in multi-OS environments.

Thoughts on this?

Comment by praet — Thu Mar 10 12:50:28 2011

comment 3

Interesting, I had not heard of variable symlinks before. AFAIK linux does not have them.

Comment by joey — Tue Mar 15 23:03:19 2011

Brainfart

Haven't given these any serious thought (which will become apparent in a moment) but hoping they will give birth to some less retarded ideas:

Bait'n'switch

pre-commit: Replace all staged symlinks (when pointing to annexed files) with plaintext files containing the key of their respective annexed content, re-stage, and add their paths (relative to repo root) to .gitignore.
post-commit: Replace the plaintext files with (git annex fix'ed) symlinks.

In doing so, the blobs to be committed can remain unaltered, irrespective of their related files' depth in the directory hierarchy.

To prevent git from reporting ALL annexed files as unstaged changes after running post-commit hook, their paths would need to be added to .gitignore.

This wouldn't cause any issues when adding files, very little when modifying files (would need some alterations to "git annex unlock"), BUT would make git totally oblivious to removals...

Manifest-based (re)population

Keep a manifest of all annexed files (key + relative path)
DON'T track the symlinks (.gitignore)
Populate/update the directory structure using a post-commit hook.

... thus circumventing the issue entirely, yet diffstats (et al.) would be rather uninformative.

Wide open to suggestions, criticism, mocking laughter and finger-pointing :)

Comment by praet — Sun Mar 20 16:11:27 2011

comment 5

In the meantime, would it be acceptable to split the pre-commit hook into two discrete parts?

This would allow to (if preferred) defer "git annex fix" until post-commit while still keeping the safety net for unlocked files.

Comment by praet — Mon Mar 21 15:58:34 2011

Comments on this page are closed.