\section{K-buckets} K-buckets is a data structure for efficiently storing a set of nodes close to a certain key called the base key. The base key is constant throughout the lifetime of a k-buckets instance. \begin{code} {-# LANGUAGE GeneralizedNewtypeDeriving #-} {-# LANGUAGE NamedFieldPuns #-} {-# LANGUAGE Trustworthy #-} module Network.Tox.DHT.KBuckets where import Control.Applicative (Applicative, (<$>)) import Data.Binary (Binary) import Data.Foldable (toList) import Data.List (sortBy) import Data.Map (Map) import qualified Data.Map as Map import Data.Maybe (isJust) import Data.Ord (comparing) import Data.Traversable (Traversable, mapAccumR, traverse) import Data.Word (Word8) import Test.QuickCheck.Arbitrary (Arbitrary, arbitrary) import Test.QuickCheck.Gen (Gen) import qualified Test.QuickCheck.Gen as Gen import Network.Tox.Crypto.Key (PublicKey) import Network.Tox.DHT.ClientList (ClientList) import qualified Network.Tox.DHT.ClientList as ClientList import Network.Tox.DHT.Distance (Distance) import qualified Network.Tox.DHT.Distance as Distance import Network.Tox.NodeInfo.NodeInfo (NodeInfo) import qualified Network.Tox.NodeInfo.NodeInfo as NodeInfo import Network.Tox.Time (Timestamp) {------------------------------------------------------------------------------- - - :: Implementation. - ------------------------------------------------------------------------------} \end{code} A k-buckets is a map from small integers \texttt{0 <= n < 256} to Client Lists of maximum size $k$. Each Client List is called a (k-)bucket. A k-buckets is equipped with a base key, and each bucket has this key as its base key. \texttt{k} is called the bucket size. The default bucket size is 8. A large bucket size was chosen to increase the speed at which peers are found. \begin{code} data KBuckets = KBuckets { bucketSize :: Int , buckets :: Map KBucketIndex ClientList , baseKey :: PublicKey } deriving (Eq, Read, Show) defaultBucketSize :: Int defaultBucketSize = 8 empty :: PublicKey -> KBuckets empty = KBuckets defaultBucketSize Map.empty \end{code} The above number \texttt{n} is the bucket index. It is a non-negative integer with the range \texttt{[0, 255]}, i.e. the range of an 8 bit unsigned integer. \begin{code} newtype KBucketIndex = KBucketIndex Word8 deriving (Eq, Ord, Read, Show, Num, Binary, Enum) \end{code} \subsection{Bucket Index} The index of the bucket can be computed using the following function: \texttt{bucketIndex(baseKey, nodeKey) = 255 - log\_2(distance(baseKey, nodeKey))}. This function is not defined when \texttt{baseKey == nodeKey}, meaning k-buckets will never contain a Node Info about the base node. Thus, each k-bucket contains only Node Infos for whose keys the following holds: if node with key \texttt{nodeKey} is in k-bucket with index \texttt{n}, then \texttt{bucketIndex(baseKey, nodeKey) == n}. Thus, n'th k-bucket consists of nodes for which distance to the base node lies in range \verb![2^n, 2^(n+1) - 1]!. The bucket index can be efficiently computed by determining the first bit at which the two keys differ, starting from the most significant bit. So, if the local DHT key starts with e.g. \texttt{0x80} and the bucketed node key starts with \texttt{0x40}, then the bucket index for that node is 0. If the second bit differs, the bucket index is 1. If the keys are almost exactly equal and only the last bit differs, the bucket index is 255. \begin{code} bucketIndex :: PublicKey -> PublicKey -> Maybe KBucketIndex bucketIndex pk1 pk2 = fmap (\index -> 255 - fromIntegral index) $ Distance.log2 $ Distance.xorDistance pk1 pk2 \end{code} \subsection{Manipulating k-buckets} TODO: this is different from kademlia's least-recently-seen eviction policy; why the existing solution was chosen, how does it affect security, performance and resistance to poisoning? original paper claims that preference of old live nodes results in better persistence and resistance to basic DDoS attacks; Any update or lookup operation on a k-buckets instance that involves a single node requires us to first compute the bucket index for that node. An update involving a Node Info with \texttt{nodeKey == baseKey} has no effect. If the update results in an empty bucket, that bucket is removed from the map. \begin{code} updateBucketForKey :: KBuckets -> PublicKey -> (ClientList -> ClientList) -> KBuckets updateBucketForKey kBuckets key f = case bucketIndex (baseKey kBuckets) key of Nothing -> kBuckets Just index -> updateBucketForIndex kBuckets index f updateBucketForIndex :: KBuckets -> KBucketIndex -> (ClientList -> ClientList) -> KBuckets updateBucketForIndex kBuckets@KBuckets { buckets, baseKey, bucketSize } index f = let -- Find the old bucket or create a new empty one. updatedBucket = f $ Map.findWithDefault (ClientList.empty baseKey bucketSize) index buckets -- Replace old bucket with updated bucket or delete if empty. updatedBuckets = if ClientList.isEmpty updatedBucket then Map.delete index buckets else Map.insert index updatedBucket buckets in kBuckets { buckets = updatedBuckets } \end{code} Adding a node to, or removing a node from, a k-buckets consists of performing the corresponding operation on the Client List bucket whose index is that of the node's public key, except that adding a new node to a full bucket has no effect. A node is considered \textit{viable} for entry if the corresponding bucket is not full. \begin{code} addNode :: Timestamp -> NodeInfo -> KBuckets -> KBuckets addNode time nodeInfo kBuckets = updateBucketForKey kBuckets publicKey $ \clientList -> let full = ClientList.full clientList alreadyIn = isJust $ ClientList.lookup publicKey clientList in if not full || alreadyIn then ClientList.addNode time nodeInfo clientList else clientList where publicKey = NodeInfo.publicKey nodeInfo removeNode :: PublicKey -> KBuckets -> KBuckets removeNode publicKey kBuckets = updateBucketForKey kBuckets publicKey $ ClientList.removeNode publicKey viable :: NodeInfo -> KBuckets -> Bool viable nodeInfo KBuckets{ baseKey, buckets } = case bucketIndex baseKey $ NodeInfo.publicKey nodeInfo of Nothing -> False Just index -> case Map.lookup index buckets of Nothing -> True Just bucket -> not $ ClientList.full bucket \end{code} Iteration order of a k-buckets instance is in order of distance from the base key. I.e. the first node seen in iteration is the closest, and the last node is the furthest away in terms of the distance metric. \begin{code} traverseClientLists :: Applicative f => (ClientList -> f ClientList) -> KBuckets -> f KBuckets traverseClientLists f kBuckets@KBuckets{ buckets } = (\x -> kBuckets{ buckets = x }) <$> traverse f (reverseT buckets) where reverseT :: (Traversable t) => t a -> t a reverseT t = snd (mapAccumR (\ (x:xs) _ -> (xs, x)) (toList t) t) closeNodes :: PublicKey -> KBuckets -> [ (Distance, NodeInfo) ] closeNodes publicKey KBuckets{ baseKey, buckets } = let (further, at, nearer) = case bucketIndex baseKey publicKey of Nothing -> (buckets, Nothing, Map.empty) Just index -> Map.splitLookup index buckets clientClose = ClientList.closeNodes publicKey bucketsClose = sortBy (comparing fst) . concatMap clientClose in concat [ maybe [] clientClose at , bucketsClose $ Map.elems nearer , bucketsClose $ Map.elems further ] {------------------------------------------------------------------------------- - - :: Tests. - ------------------------------------------------------------------------------} getAllNodes :: KBuckets -> [NodeInfo] getAllNodes = concatMap ClientList.nodeInfos . Map.elems . buckets genKBuckets :: PublicKey -> Gen KBuckets genKBuckets publicKey = foldl (flip $ uncurry addNode) (empty publicKey) <$> Gen.listOf arbitrary instance Arbitrary KBuckets where arbitrary = arbitrary >>= genKBuckets \end{code}