\section{K-buckets}
K-buckets is a data structure for efficiently storing a set of nodes close to a
certain key called the base key. The base key is constant throughout the
lifetime of a k-buckets instance.
\begin{code}
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
{-# LANGUAGE NamedFieldPuns #-}
{-# LANGUAGE Trustworthy #-}
module Network.Tox.DHT.KBuckets where
import Control.Applicative (Applicative, (<$>))
import Data.Binary (Binary)
import Data.Foldable (toList)
import Data.List (sortBy)
import Data.Map (Map)
import qualified Data.Map as Map
import Data.Maybe (isJust)
import Data.Ord (comparing)
import Data.Traversable (Traversable, mapAccumR,
traverse)
import Data.Word (Word8)
import Test.QuickCheck.Arbitrary (Arbitrary, arbitrary)
import Test.QuickCheck.Gen (Gen)
import qualified Test.QuickCheck.Gen as Gen
import Network.Tox.Crypto.Key (PublicKey)
import Network.Tox.DHT.ClientList (ClientList)
import qualified Network.Tox.DHT.ClientList as ClientList
import Network.Tox.DHT.Distance (Distance)
import qualified Network.Tox.DHT.Distance as Distance
import Network.Tox.NodeInfo.NodeInfo (NodeInfo)
import qualified Network.Tox.NodeInfo.NodeInfo as NodeInfo
import Network.Tox.Time (Timestamp)
\end{code}
A k-buckets is a map from small integers \texttt{0 <= n < 256} to Client Lists
of maximum size $k$. Each Client List is called a (k-)bucket. A k-buckets is
equipped with a base key, and each bucket has this key as its base key.
\texttt{k} is called the bucket size. The default bucket size is 8.
A large bucket size was chosen to increase the speed at which peers are found.
\begin{code}
data KBuckets = KBuckets
{ bucketSize :: Int
, buckets :: Map KBucketIndex ClientList
, baseKey :: PublicKey
}
deriving (Eq, Read, Show)
defaultBucketSize :: Int
defaultBucketSize = 8
empty :: PublicKey -> KBuckets
empty = KBuckets defaultBucketSize Map.empty
\end{code}
The above number \texttt{n} is the bucket index. It is a non-negative integer
with the range \texttt{[0, 255]}, i.e. the range of an 8 bit unsigned integer.
\begin{code}
newtype KBucketIndex = KBucketIndex Word8
deriving (Eq, Ord, Read, Show, Num, Binary, Enum)
\end{code}
\subsection{Bucket Index}
The index of the bucket can be computed using the following function:
\texttt{bucketIndex(baseKey, nodeKey) = 255 - log\_2(distance(baseKey,
nodeKey))}. This function is not defined when \texttt{baseKey == nodeKey},
meaning k-buckets will never contain a Node Info about the base node.
Thus, each k-bucket contains only Node Infos for whose keys the following
holds: if node with key \texttt{nodeKey} is in k-bucket with index \texttt{n},
then \texttt{bucketIndex(baseKey, nodeKey) == n}. Thus, n'th k-bucket consists
of nodes for which distance to the base node lies in range
\verb![2^n, 2^(n+1) - 1]!.
The bucket index can be efficiently computed by determining the first bit at
which the two keys differ, starting from the most significant bit. So, if the
local DHT key starts with e.g. \texttt{0x80} and the bucketed node key starts
with \texttt{0x40}, then the bucket index for that node is 0. If the second
bit differs, the bucket index is 1. If the keys are almost exactly equal and
only the last bit differs, the bucket index is 255.
\begin{code}
bucketIndex :: PublicKey -> PublicKey -> Maybe KBucketIndex
bucketIndex pk1 pk2 =
fmap (\index -> 255 - fromIntegral index) $ Distance.log2 $ Distance.xorDistance pk1 pk2
\end{code}
\subsection{Manipulating k-buckets}
TODO: this is different from kademlia's least-recently-seen eviction policy; why
the existing solution was chosen, how does it affect security, performance and
resistance to poisoning? original paper claims that preference of old live nodes
results in better persistence and resistance to basic DDoS attacks;
Any update or lookup operation on a k-buckets instance that involves a single
node requires us to first compute the bucket index for that node. An update
involving a Node Info with \texttt{nodeKey == baseKey} has no effect. If the
update results in an empty bucket, that bucket is removed from the map.
\begin{code}
updateBucketForKey :: KBuckets -> PublicKey -> (ClientList -> ClientList) -> KBuckets
updateBucketForKey kBuckets key f =
case bucketIndex (baseKey kBuckets) key of
Nothing -> kBuckets
Just index -> updateBucketForIndex kBuckets index f
updateBucketForIndex :: KBuckets -> KBucketIndex -> (ClientList -> ClientList) -> KBuckets
updateBucketForIndex kBuckets@KBuckets { buckets, baseKey, bucketSize } index f =
let
updatedBucket = f $ Map.findWithDefault (ClientList.empty baseKey bucketSize) index buckets
updatedBuckets =
if ClientList.isEmpty updatedBucket
then Map.delete index buckets
else Map.insert index updatedBucket buckets
in
kBuckets { buckets = updatedBuckets }
\end{code}
Adding a node to, or removing a node from, a k-buckets consists of performing
the corresponding operation on the Client List bucket whose index is that of
the node's public key, except that adding a new node to a full bucket has no
effect. A node is considered \textit{viable} for entry if the corresponding
bucket is not full.
\begin{code}
addNode :: Timestamp -> NodeInfo -> KBuckets -> KBuckets
addNode time nodeInfo kBuckets =
updateBucketForKey kBuckets publicKey $ \clientList ->
let
full = ClientList.full clientList
alreadyIn = isJust $ ClientList.lookup publicKey clientList
in
if not full || alreadyIn
then ClientList.addNode time nodeInfo clientList
else clientList
where
publicKey = NodeInfo.publicKey nodeInfo
removeNode :: PublicKey -> KBuckets -> KBuckets
removeNode publicKey kBuckets =
updateBucketForKey kBuckets publicKey $ ClientList.removeNode publicKey
viable :: NodeInfo -> KBuckets -> Bool
viable nodeInfo KBuckets{ baseKey, buckets } =
case bucketIndex baseKey $ NodeInfo.publicKey nodeInfo of
Nothing -> False
Just index -> case Map.lookup index buckets of
Nothing -> True
Just bucket -> not $ ClientList.full bucket
\end{code}
Iteration order of a k-buckets instance is in order of distance from the base
key. I.e. the first node seen in iteration is the closest, and the last node
is the furthest away in terms of the distance metric.
\begin{code}
traverseClientLists ::
Applicative f => (ClientList -> f ClientList) -> KBuckets -> f KBuckets
traverseClientLists f kBuckets@KBuckets{ buckets } =
(\x -> kBuckets{ buckets = x }) <$> traverse f (reverseT buckets)
where
reverseT :: (Traversable t) => t a -> t a
reverseT t = snd (mapAccumR (\ (x:xs) _ -> (xs, x)) (toList t) t)
closeNodes :: PublicKey -> KBuckets -> [ (Distance, NodeInfo) ]
closeNodes publicKey KBuckets{ baseKey, buckets } =
let
(further, at, nearer) = case bucketIndex baseKey publicKey of
Nothing -> (buckets, Nothing, Map.empty)
Just index -> Map.splitLookup index buckets
clientClose = ClientList.closeNodes publicKey
bucketsClose = sortBy (comparing fst) . concatMap clientClose
in
concat
[ maybe [] clientClose at
, bucketsClose $ Map.elems nearer
, bucketsClose $ Map.elems further
]
getAllNodes :: KBuckets -> [NodeInfo]
getAllNodes =
concatMap ClientList.nodeInfos . Map.elems . buckets
genKBuckets :: PublicKey -> Gen KBuckets
genKBuckets publicKey =
foldl (flip $ uncurry addNode) (empty publicKey) <$> Gen.listOf arbitrary
instance Arbitrary KBuckets where
arbitrary = arbitrary >>= genKBuckets
\end{code}