g3p-hash-1.0.0.2: Global Password Prehash Protocol
Safe HaskellSafe-Inferred
LanguageHaskell2010

Crypto.G3P

Description

The Global Password Prehash Protocol (G3P) is a slow, attribution-armored password hash function and cryptographic key derivation function. It supports self-documenting deployments whose password hashes are traceable or useless after they have been stolen. This secondary security goal seeks to use cryptoacoustics to provide embedded attributions that are as difficult as possible for an adversarial implementation to remove.

The G3P revisits the role of cryptographic salt, splitting the salt into the cartesian product of the seguid, username, and tag parameters. Any parameter with "tag" as part of the name is an embedded attribution to anybody providing the inputs to the username or password parameters. Tags are themselves directly self-documenting embedded attributions, in the sense that one cannot easily or efficiently replace the tag with anything else without losing the ability to compute the correct hash function.

The seguid corresponds to the key used for every call to HMAC-SHA256, right up until final output expansion. In this way the G3P mimicks the construction of HKDF, with the seguid corresponding to HKDF's salt parameter. The G3P also mimicks PBKDF2 used in an alternate mode of operation.

The seguid can be trivially replaced with a precomputed HMAC key, thus the seguid is not a direct tag. However this precomputed key is a cryptographic hash of the seguid, and for this reason the seguid is capable of serving as an indirect tag, which the Seguid Protocol is designed to utilize via Self-Documenting Globally Unique Identifiers (seguids).

It is strongly recommend a deployment identify itself with a single 64-byte (512-bit) seguid, and the deployment's choice of plaintext messages to be delivered via tags. These salts can be constants across the entire deployment, as the username is intended to be used as the final bit of salt within a deployment.

In a traditional password hash function, the salt is a random bytestring typically between 8 and 32 bytes long. One of its primary purposes is to identifiy a unique hash function so that one cannot attempt to crack multiple password hashes with a single key-stretching computation. Oftentimes this is implemented by storing a salt per user.

However, in the context of a client-side prehash, storing a salt per user has the potential to leak whether or not an account exists, or if a password has changed. The G3P has the option to eliminate these complications, because it is safe to use a plain username as the salt, in addition to the deployment-identifying seguid and tags.

On the other hand, if one is aware of the potential issues surrounding the implementation of a random per-user salt in a client-side hashing context, and is willing to mitigate or live with them, then there are potential advantages to using a random salt as the input to the G3P's username parameter instead.

All parameter names are suggestive, not prescriptive. Usage is ultimately defined by the deployment.

When somebody is guessing a username, they must also know (or guess) the password. However, the username need not be revealed to somebody who is guessing the password, as the raw username can always be replaced by a precomputed hash. If this intentional feature is not desired, a deployment might choose to swap the username and password, as these inputs are otherwise functionally identical.

The usage and interpretation(s) of any given parameter is always defined by the deployment, and is never defined by offical G3P documentation or specifications.

The G3P always has room for more salt. It doesn't really make sense to inject more than 256 bits of entropy into the username parameter, because when the G3P is partially applied to a constant username, the raw input can be replaced with a SHA256 state. This is not true of any of the tags: it doesn't matter how long it is, the whole tag must be present for the hash computation to be correct.

Every parameter with the word _tag_ in its name exhibits this property. Theoretically, one could specify a G3P-based hash function that requires terabytes of salt to be hashed billions of times over. However it is unclear what purpose such an impractical specification might serve.

This initial variant of the G3P employs a combination of PHKDF and bcrypt. PHKDF serves as the primary cryptoacoustic component, and bcrypt serves as the primary key-stretching component of the G3P. Both are secondarily used in the alternate role as well, with the PHKDF adding a tiny bit of key stretching and bcrypt providing significant additional cryptoacoustic plaintext repetitions.

  1. Every bit of every parameter matters. Every boundary between parameters matters. The presence and position of every null byte and every empty string matters. There aren't supposed to be any trivial collisions, the only exception being null-byte extension collisions on the seguid, which serves as an HMAC-SHA256 key.
  2. Except for the tweaks, any change to any parameter requires restarting the PHKDF key-stretching computation from somewhere in the very first call to HMAC.
  3. All input arguments are hardened against length-related timing side channels in various different ways.

    At one extreme, the username, password, and long tag have the most aggressive length hardening in the conventional sense, exhibiting no timing side channels except on multi-kilobyte inputs, after which the timing impacts are minimized.

    At another extreme, the domain tag exhibits severe yet predictable timing side channels transitioning from 19 to 20 bytes and every 64 bytes thereafter. However, the domain tag is otherwise free of timing-based side channels, so it too is hardened in its own way.

The design I converged upon employs fairly complicated data encoding procedures. Unfortunately, this provides a fair bit of surface area for subtly wrong implementations that work most of the time, but will return garbage on certain lengths of inputs. I hope that this will eventually be remediated with a more comprehensive suite of test vectors.

Note that the username, password, long-tag, and credentials vector are all horn-loaded inputs in the sense that they are consumed a constant number of times near the beginning of the hashing protocol, and after each PHKDF round, the hash with the least key-stretching applied is discarded.

This implies that particularly paranoid password-handling implementations can eliminate the password from memory even before key-stretching is complete. Additionally, assuming all the sensitive secrets are contained in horn-loaded parameters, this implies the key-stretching computation can be relocated at nearly any time with full credit for any key-stretching already performed.

One of the associated costs is that collisions on horn-loaded inputs can be found over the entire G3P by "only" colliding the first call to HMAC-SHA256, G3Pb1 alfa. If it were trivial to produce collisions on HMAC-SHA256, this would very likely make collisions on the horn-loaded inputs trivial. However such an attack would be unlikely to be able to immediately produce collisions that vary any of the other inputs. This is because all the other inputs are repeated elsewhere in the protocol, thus colliding G3Pb1 alfa isn't enough to collide the final output of the G3P.

This "cost" seems acceptable in the context of password-based authentication flows, where collision resistance and second preimage resistance are not directly relevant. What is crucially important is preimage resistance and maximizing the cost of parallelizing multiple key-stretching computations while minimizing the latency of a single key-stretching computation.

Synopsis

Documentation

data G3PInputBlock Source #

These input parameters are grouped together because the envisioned use for them is that they are constants (or near-constants) specified by a deployment. User-supplied inputs would typically not go here. In this role, all these parameters function as salt.

The seguid parameter acts as a deployment-wide salt. Cryptographically speaking, the most important thing a deployment can do is specify a constant seguid. It is highly recommended that the seguid input be a genuine Self-Documenting Globally Unique Identifier attesting to the parameters, purposes, and public playbook of the protocol for y'all to follow to use the deployment to spec.

The remaining string parameters are all directly-documenting, embedded attributions. A deployment can use these tags to encode a message into the password hash function so that it must be known to whomever can compute it. There are a variety of different parameters because there are different lengths of messages that can be expressed for free, and there are different incremental costs for exceeding that limit.

It is particularly important to include some kind of actionable message in the domainTag, longTag, bcryptTag, and bcryptSaltTag parameters. Specifying an empty string in any of these parameters means that a significant quantity of cryptoacoustic messaging space will be filled with silence.

Especially useful messages include URIs, legal names, and domain names.

Constructors

G3PInputBlock 

Fields

  • g3pInputBlock_seguid :: !ByteString

    HMAC-SHA256 key, usable as a high-repetition indirect tag via self-documenting globally unique identifiers (seguids).

  • g3pInputBlock_domainTag :: !ByteString

    plaintext tag with one repetition per PHKDF round. 0-19 bytes are free, 20-83 bytes cost a additional sha256 block per PHKDF round, with every 64 bytes thereafter incurring a similar cost.

    Tags up to 83 or maybe even 147 bytes long might be reasonable. In the case of longer domain tags, it is strategically advantageous to ensure that the first 32 bytes are highly actionable.

    This parameter provides domain separation. A suggested value is a ICANN domain name controlled by the deployment. The name is also a bit of an homage to the "realm" parameter of HTTP basic authentication, which in part inspired it.

  • g3pInputBlock_longTag :: !ByteString

    plaintext tag with 1x repetition, then cycled for roughly 8 kilobytes. Constant time on inputs up to nearly 5 kilobytes.

    Overages incur one sha256 block per 64 bytes.

  • g3pInputBlock_tags :: !(Vector ByteString)

    plaintext tags with 3x repetition. Constant-time on 0-63 encoded bytes, which includes the length encoding of each string. Thus 60 of those free bytes are usable if the tags vector is a single string, or less if it contains two or more strings.

    Overages incur three sha256 blocks per 64 bytes.

    This parameter is notable because it is the least expensive purely auxiliary input that is not horn-loaded. Thus if you want a very long salt input that provides a bit of extra collision resistance, (because the collision resistance of HMAC-SHA256 isn't enough?!) this would be a logical candidate input location to consider.

  • g3pInputBlock_phkdfRounds :: !Word32

    How expensive will the PHKDF component be? An optimal implementation computes exactly three SHA256 blocks per round if the domain tag is 19 bytes or less, plus a reasonably large but constant number of additional blocks. I recommend at least 20,000 rounds, if not 40,000. You might consider adjusting that recommendation downward in the case of domain tags that exceed 19 bytes in length: 15,000 rounds of PHKDF with a domain tag that is 83 bytes long should cost about the same number of SHA256 blocks as 20,000 rounds of PHKDF with a domain tag that is 19 bytes long.

  • g3pInputBlock_bcryptRounds :: !Word32

    How expensive will the bcrypt component be? 4000 rounds recommended, give or take a factor of 2 or so. Each bcrypt round is approximately as time consuming as 60 PHKDF rounds. Using the recommendation, the cost should be dominated by bcrypt.

  • g3pInputBlock_bcryptTag :: !ByteString

    Repeated once or twice per bcrypt round, plus once in PHKDF. This tag has exactly two full repetitions per bcrypt round when the tag is up to 56 bytes long. Above 56 bytes, this tag is cyclically extended to 112 bytes and then split into two strings of 56 bytes, each repeated once. Hashed once in PHKDF to avoid any truncation gotchas, and to force recomputation of PHKDF if this tag is varied.

    0-112 bytes can be handled in a constant number of cryptographic operations. Overages incur a cost of one SHA-256 block per 64 bytes.

data G3PInputArgs Source #

The username and password are grouped together because they are normally expected to be supplied by users or other observers of a deployment.

Furthermore, the credentials vector is here because it is an ideal location to include other user input. For example, one could implement a Two-Secret Key Derivation (2SKD) scheme analogous to 1Password's.

A deployment can also specify additional constant tags as part of the credentials vector. As the plaintext of these tags is only ever hashed into the output a single time, this alongside the bcrypt tags are the least expensive pay-as-you-go options for plaintext tagging.

Note that the username and password are subjected to additional length hardening. The G3P operates in a constant number of SHA256 blocks so long as the combined length of the username and password is less than about 3 KiB, or the combined length of the username, password, and long tag is less than about 8 KiB. The actual numbers are somewhat less in both cases, but this is a reasonable approximation. Note that the bcrypt tags can subtract up to 114 bytes from the 8 KiB total, and don't effect the 3 KiB total.

In the case of all of the inputs in this record, longer values incur one SHA256 block per 64 bytes.

Constructors

G3PInputArgs 

Fields

newtype G3PInputRole Source #

These parameters are used to tweak the final output, without redoing any expensive key stretching. A possible use case is including a high entropy secret in the role itself that isn't available until after a successful stage of authentication.

Since these parameters are processed in a context that could conceivably be performance sensitive, we don't apply any length padding or side-channel hardening. Instead we opt for maximizing free tagging space. Thus we want to avoid incurring additional SHA256 block computations, one of the favorite techniques employed by the key-stretching phase of the G3P to harden against timing side-channels.

A deployment could conceivably harden this expansion phase against timing side channels themselves, if the were sufficiently inclined. There are several techniques. For starters, a deployment could ensure that these parameters themselves are constant-length. Alternatively, a deployment could specify an additional variable-length string in the role vector, used to control the ending position relative to the SHA256 buffer.

Constructors

G3PInputRole 

Fields

  • g3pInputRole_roleTags :: Vector ByteString

    This is the least expensive parameter that will vary the secret HMAC key used to generate the final output. Very much analogous to HKDF's initial keying material (ikm) parameter. This is the recommended last call for mixing additional secrets into the output.

newtype G3PInputEcho Source #

Constructors

G3PInputEcho 

Fields

  • g3pInputEcho_echoTag :: ByteString

    the absolute least expensive parameter to vary, if your implementation supports it. Very much analogous to HKDF's info parameter. 0-19 bytes are free. This incurs a cost of one SHA-256 block per output block at 20 bytes and every 64 bytes thereafter.

data G3PSeed Source #

A plain-old-data explicit representation of the intermediate g3pHash computation after the G3PInputBlock and G3PInputArgs have been processed and key stretching has been completed, but before the tweaks have been applied and the final output generated.

If you ever need to serialize or persist a seed, you probably want this.

Intended to be generated by g3pHash_seedInit and consumed by g3pHash_seedFinalize.

Constructors

G3PSeed 

Fields

Instances

Instances details
Eq G3PSeed Source # 
Instance details

Defined in Crypto.G3P

Methods

(==) :: G3PSeed -> G3PSeed -> Bool #

(/=) :: G3PSeed -> G3PSeed -> Bool #

data G3PKey Source #

Instances

Instances details
Eq G3PKey Source # 
Instance details

Defined in Crypto.G3P

Methods

(==) :: G3PKey -> G3PKey -> Bool #

(/=) :: G3PKey -> G3PKey -> Bool #

g3pHash :: G3PInputBlock -> G3PInputArgs -> G3PInputRole -> G3PInputEcho -> Stream ByteString Source #

The Global Password Prehash Protocol (G3P). Note that this function is very intentionally implemented in such a way that the following idiom is efficient. It performs the expensive key stretching phase only once.

 let mySeed = g3pHash block args
     myKey0 = mySeed myRole0
     myKey1 = mySeed myRole1
  in [ myKey0 myEcho , myKey0 altEcho, myKey1 myEcho, myKey1 altEcho ]

This expression also only performs 2 output key computations, though this is very fast compared to the stretching applied to the seed. It's still slower than varying only the echo tag. Thus we end up with four cryptographically independent bytestreams.

In the case that you want or need to persist or serialize the intermediate seed, or change the seguid or domain tag before final output expansion, then the plain-old-datatype G3PSeed and its companion functions g3pHash_seedInit and g3pHash_seedFinalize are needed.

g3pHash_seedInit :: G3PInputBlock -> G3PInputArgs -> G3PSeed Source #

This generates a seed, which encapsulates the expensive key-stretching component of g3pHash into a reusable, tweakable cryptographic value. This function is way slower than it's companion, g3pHash_seedFinalize. Broadly comparable to HKDF-Extract, though with key stretching built-in.

g3pHash_keyInit :: G3PInputRole -> G3PSeed -> G3PKey Source #

This consumes a seed and tweaks to produce the final output stream. This function is the output expansion phase of g3pHash. This function is way faster than it's companion g3pHash_seedInit. Broadly comparable to HKDF. Note that this function ignores g3pSeed_seguid in favor of g3pSeed_seguidKey.