We had a need for identifiers that could be used by humans.
The requirement to be able to say these over the phone complicates matters. Most people have approached this problem by using a phonetic alphabet. The trouble comes when you hear people saying stuff like "A as in ... uh, Apple?" (should be Alpha, of course) and "U as in ... um, what's a word that starts with U?" It gets worse. Ever been to a GPG keysigning? Listen to people attempt to read out the digits of their key fingerprints. ...C 3 E D 0 0 0 0 0 0 0 2 B D B D... "Did you say 'C' or 'D'?" and "how many zeros was that?" Brutal.
So what we need is a symbol set where each digit is unambigious and doesn't collide with the phonetics of another symbol. This package provides Locator16, a set of 16 letters and numbers that, when spoken in English, have unique pronounciation.
Also included is code to work in base 62, which is simply
'z']. These are frequently used to express
short codes in URL redirectors; you may find them a more useful encoding for
expressing numbers than base 16 hexidecimal.
- class (Ord α, Enum α, Bounded α) => Locator α where
- data English16
- fromLocator16 :: String -> Int
- toLocator16 :: Int -> String
- toLocator16a :: Int -> Int -> String
- hashStringToLocator16a :: Int -> ByteString -> ByteString
- toBase62 :: Integer -> String
- fromBase62 :: String -> Integer
- padWithZeros :: Int -> String -> String
- hashStringToBase62 :: Int -> ByteString -> ByteString
This was somewhat inspired by the record locators used by the civilian air travel industry, but with the restriction that the symbol set is carefully chosen (aviation locators do heroic things like excluding 'I' but not much else) and, in the case of Locator16a, to not repeat symbols. They're not a reversable encoding, but assuming you're just generating identifiers and storing them somewhere, they're quite handy.
TODO link to paper with pronunciation study when published.
A symbol set with sixteen uniquely pronounceable digits.
The fact there are sixteen symbols is more an indication of a certain degree of bullheaded-ness on the part of the author, and less of any kind of actual requirement. We might have a slighly better readback score if we dropped to 15 or 14 unique characters. It does mean you can match up with hexidecimal, which is not entirely without merit.
The grouping of letters and numbers was the hard part; having come up with the set and deconflicted the choices, the ordering is then entirely arbitrary. Since there are some numbers, might as well have them at the same place they correspond to in base 10; the letters were then allocated in alpha order in the remaining slots.
Given a number encoded in Locator16, convert it back to an integer.
Given a number, convert it to a string in the Locator16 base 16 symbol alphabet. You can use this as a replacement for the standard '0'-'9' 'A'-'F' symbols traditionally used to express hexidemimal, though really the fact that we came up with 16 total unique symbols was a nice co-incidence, not a requirement.
Represent a number in Locator16a format. This uses the Locator16 symbol set, and additionally specifies that no symbol can be repeated. The a in Locator16a represents that this transformation is done on the cheap; when converting if we end up with '9' '9' we simply pick the subsequent digit in the enum, in this case getting you '9' 'K'.
Note that the transformation is not reversible. A number like
0x1111, incidentally) encodes as
12C4. So do
4372. The point is not uniqueness, but readibility in adverse
conditions. So while you can count locators, they don't map continuously to
The first argument is the number of digits you'd like in the locator; if the number passed in is less than 16^limit, then the result will be padded.
toLocator16a 6 436912C40F
Take an arbitrary sequence of bytes, hash it with SHA1, then format as a
digits-long Locator16 string.
hashStringToLocator16a 6 "Hello World"M48HR0
Utility function to prepend '0' characters to a string representing a number. This allows you to ensure a fixed width for numbers that are less than the desired width in size. This comes up frequently when representing numbers in other bases greater than 10 as they are inevitably presented as text, and not having them evenly justified can (at best) be ugly and (at worst) actually lead to parsing and conversion bugs.
Take an arbitrary string, hash it, then pad it with zeros up to be a
digits-long string in base 62.
You may be interested to know that the 160-bit SHA1 hash used here can be expressed without loss as 27 digits of base 62, for example:
hashStringToBase62 27 "Hello World"1T8Sj4C5jVU6iQXCwCwJEPSWX6u