hook

This special remote type lets you store content in a remote of your own devising.

It's not recommended to use this remote type when another like rsync or directory will do. If your hooks are not carefully written, data could be lost.

example

Here's a simple example that stores content on clay tablets. If you implement this example in the real world, I'd appreciate a tour next Apert! :) --Joey

# git config annex.cuneiform-store-hook 'tocuneiform < "$ANNEX_FILE" | tablet-writer --implement=stylus --title="$ANNEX_KEY" | tablet-proofreader | librarian --shelve --floor=$ANNEX_HASH_1 --shelf=$ANNEX_HASH_2'
# git config annex.cuneiform-retrieve-hook 'librarian --get --floor=$ANNEX_HASH_1 --shelf=$ANNEX_HASH_2 --title="$ANNEX_KEY" | tablet-reader --implement=coffee --implement=glasses --force-monastic-dedication | fromcuneiform > "$ANNEX_FILE"'
# git config annex.cuneiform-remove-hook 'librarian --get --floor=$ANNEX_HASH_1 --shelf=$ANNEX_HASH_2 --title="$ANNEX_KEY" | goon --hit-with-hammer'
# git config annex.cuneiform-checkpresent-hook 'librarian --find --force-distrust-catalog --floor=$ANNEX_HASH_1 --shelf=$ANNEX_HASH_2 --title="$ANNEX_KEY" --shout-title'
# git annex initremote library type=hook hooktype=cuneiform encryption=none
# git annex describe library "the reborn Library of Alexandria (upgrade to bronze plates pending)"

Can you spot the potential data loss bugs in the above simple example? (Hint: What happens when the tablet-proofreader exits nonzero?)

configuration

These parameters can be passed to git annex initremote:

encryption - Required. Either "none" to disable encryption, or a value that can be looked up (using gpg -k) to find a gpg encryption key that will be given access to the remote, or "shared" which allows every clone of the repository to access the encrypted data.

Note that additional gpg keys can be given access to a remote by running enableremote with the new key id. See encryption.
hooktype - Required. This specifies a collection of hooks to use for this remote.

hooks

Each type of hook remote is specified by a collection of hook commands. Each hook command is run as a shell command line, and should return nonzero on failure, and zero on success.

These environment variables are used to communicate with the hook commands:

ANNEX_KEY - name of a key to store, retrieve, remove, or check.
ANNEX_FILE - a file containing the key's content
ANNEX_HASH_1 - short stable value, based on the key, can be used for hashing into 1024 buckets.
ANNEX_HASH_2 - another hash value, can be used for a second level of hashing

The setting to use in git config for the hook commands are as follows:

annex.$hooktype-store-hook - Command run to store a key in the special remote. ANNEX_FILE contains the content to be stored.
annex.$hooktype-retrieve-hook - Command run to retrieve a key from the special remote. ANNEX_FILE is a file that the retrieved content should be written to. The file may already exist with a partial copy of the content (or possibly just garbage), to allow for resuming of partial transfers.
annex.$hooktype-remove-hook - Command to remove a key from the special remote.
annex.$hooktype-checkpresent-hook - Command to check if a key is present in the special remote. Should output the key name to stdout, on its own line, if and only if the key has been actively verified to be present in the special remote (caching presence information is a very bad idea); all other output to stdout will be ignored.

Asynchronous hooks?

Is there a way to use asynchronous remotes? Interaction with git annex would have to split the part of initiating some action from completing it.

I imagine I could git annex copy a file to an asynchronous remote and the command would almost immediately complete. Later I would learn that the transfer is completed, so the hook must be able to record that information in the git-annex branch. An additional plumbing command seems required here as well as a way to indicate that even though the store-hook completed, the file is not transferred.

Similarly git annex get would immediately return without actually fetching the file. This should already be possible by returning non-zero from the retrieve-hook. Later the hook could use plumbing level commands to actually stick the received file into the repository.

The remove-hook should need no changes, but the checkpresent-hook would be more like a trigger without any actual result. The extension of the plumbing required for the extension to the receive-hook could update the location log. A downside here is that you never know when a fsck has completed.

My proposal does not include a way to track the completion of actions, but relies on the hook to always complete them reliably. It is not clear that this is the best road for asynchronous hooks.

One use case for this would be a remote that is only accessible via uucp. Are there other use cases? Is the drafted interface useful?

Comment by helmut — Sat Oct 13 05:46:14 2012

Comments on this page are closed.