\section{K-buckets}

K-buckets is a data structure for efficiently storing a set of nodes close to a
certain key called the base key.  The base key is constant throughout the
lifetime of a k-buckets instance.

\begin{code}
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
{-# LANGUAGE NamedFieldPuns             #-}
{-# LANGUAGE Trustworthy                #-}
module Network.Tox.DHT.KBuckets where

import           Control.Applicative           ((<$>))
import           Data.Binary                   (Binary)
import           Data.Map                      (Map)
import qualified Data.Map                      as Map
import           Data.Ord                      (comparing)
import           Data.Word                     (Word8)
import           Test.QuickCheck.Arbitrary     (Arbitrary, arbitrary)
import           Test.QuickCheck.Gen           (Gen)
import qualified Test.QuickCheck.Gen           as Gen

import           Network.Tox.Crypto.Key        (PublicKey)
import qualified Network.Tox.DHT.Distance      as Distance
import           Network.Tox.NodeInfo.NodeInfo (NodeInfo)
import qualified Network.Tox.NodeInfo.NodeInfo as NodeInfo


{-------------------------------------------------------------------------------
 -
 - :: Implementation.
 -
 ------------------------------------------------------------------------------}

\end{code}

A k-buckets is a map from small integers \texttt{0 <= n < 256} to a set of up
to \texttt{k} Node Infos.  The set is called a bucket.  \texttt{k} is called
the bucket size.  The default bucket size is 8.

\begin{code}


data KBuckets = KBuckets
  { bucketSize :: Int
  , buckets    :: Map KBucketIndex KBucket
  , baseKey    :: PublicKey
  }
  deriving (Eq, Read, Show)


defaultBucketSize :: Int
defaultBucketSize = 8


empty :: PublicKey -> KBuckets
empty = KBuckets defaultBucketSize Map.empty


\end{code}

The number \texttt{n} is the bucket index.  It is positive integer with the
range \texttt{[0, 255]}, i.e. the range of an 8 bit unsigned integer.

\begin{code}


newtype KBucketIndex = KBucketIndex Word8
  deriving (Eq, Ord, Read, Show, Num, Binary, Enum)


\end{code}

A bucket entry is an element of the bucket.  The bucket is an ordered set, and
the entries are sorted by \href{#distance}{distance} from the base key.  Thus,
the first (smallest) element of the set is the closest one to the base key in
that set, the last (greatest) element is the furthest away.

\begin{code}


newtype KBucket = KBucket
  { bucketNodes :: Map PublicKey KBucketEntry
  }
  deriving (Eq, Read, Show)


emptyBucket :: KBucket
emptyBucket = KBucket Map.empty


bucketIsEmpty :: KBucket -> Bool
bucketIsEmpty = Map.null . bucketNodes


data KBucketEntry = KBucketEntry
  { entryBaseKey :: PublicKey
  , entryNode    :: NodeInfo
  }
  deriving (Eq, Read, Show)

instance Ord KBucketEntry where
  compare = comparing distance
    where
      distance entry =
        Distance.xorDistance
          (entryBaseKey entry)
          (NodeInfo.publicKey $ entryNode entry)


entryPublicKey :: KBucketEntry -> PublicKey
entryPublicKey = NodeInfo.publicKey . entryNode


\end{code}

\subsection{Bucket Index}

The bucket index can be computed using the following function:
\texttt{bucketIndex(baseKey, nodeKey) = 255 - log_2(distance(baseKey,
nodeKey))}.  This function is not defined when \texttt{baseKey == nodeKey},
meaning k-buckets will never contain a Node Info about the local node.

Thus, each k-bucket contains only Node Infos for whose keys the following
holds: if node with key \texttt{nodeKey} is in k-bucket with index \texttt{n},
then \texttt{bucketIndex(baseKey, nodeKey) == n}.

The bucket index can be efficiently computed by determining the first bit at
which the two keys differ, starting from the most significant bit.  So, if the
local DHT key starts with e.g. \texttt{0x80} and the bucketed node key starts
with \texttt{0x40}, then the bucket index for that node is 0.  If the second
bit differs, the bucket index is 1.  If the keys are almost exactly equal and
only the last bit differs, the bucket index is 255.

\begin{code}


bucketIndex :: PublicKey -> PublicKey -> Maybe KBucketIndex
bucketIndex pk1 pk2 =
  fmap (\index -> 255 - fromIntegral index) $ Distance.log2 $ Distance.xorDistance pk1 pk2


\end{code}

\subsection{Manipulating k-buckets}

Any update or lookup operation on a k-buckets instance that involves a single
node requires us to first compute the bucket index for that node.  An update
involving a Node Info with \texttt{nodeKey == baseKey} has no effect.  If the
update results in an empty bucket, that bucket is removed from the map.

\begin{code}


updateBucketForKey :: KBuckets -> PublicKey -> (KBucket -> KBucket) -> KBuckets
updateBucketForKey kBuckets key f =
  case bucketIndex (baseKey kBuckets) key of
    Nothing    -> kBuckets
    Just index -> updateBucketForIndex kBuckets index f


updateBucketForIndex :: KBuckets -> KBucketIndex -> (KBucket -> KBucket) -> KBuckets
updateBucketForIndex kBuckets@KBuckets { buckets } index f =
  let
    -- Find the old bucket or create a new empty one.
    updatedBucket = f $ Map.findWithDefault emptyBucket index buckets
    -- Replace old bucket with updated bucket or delete if empty.
    updatedBuckets =
      if bucketIsEmpty updatedBucket
      then Map.delete index buckets
      else Map.insert index updatedBucket buckets
  in
  kBuckets { buckets = updatedBuckets }


\end{code}

A bucket is \textit{full} when the bucket contains the maximum number of
entries configured by the bucket size.

A node is \textit{viable} for entry if the bucket is not \textit{full} or the
node's public key has a lower distance from the base key than the current entry
with the greatest distance.

If a node is \textit{viable} and the bucket is \textit{full}, the entry with
the greatest distance from the base key is removed to keep the bucket size
below the maximum configured bucket size.

Adding a node whose key already exists will result in an update of the Node
Info in the bucket.  Removing a node for which no Node Info exists in the
k-buckets has no effect.  Thus, removing a node twice is permitted and has the
same effect as removing it once.

\begin{code}


addNode :: NodeInfo -> KBuckets -> KBuckets
addNode nodeInfo kBuckets =
  updateBucketForKey kBuckets (NodeInfo.publicKey nodeInfo) $ \bucket ->
    let
      -- The new entry.
      entry = KBucketEntry (baseKey kBuckets) nodeInfo
    in
    -- Insert the entry into the bucket.
    addNodeToBucket (bucketSize kBuckets) entry bucket


addNodeToBucket :: Int -> KBucketEntry -> KBucket -> KBucket
addNodeToBucket maxSize entry =
  KBucket . truncateMap maxSize . Map.insert (entryPublicKey entry) entry . bucketNodes


truncateMap :: Ord a => Int -> Map k a -> Map k a
truncateMap maxSize m
  | Map.size m <= maxSize = m
  | otherwise =
      -- Remove the greatest element until the map is small enough again.
      truncateMap maxSize $ Map.deleteMax m


removeNodeFromBucket :: PublicKey -> KBucket -> KBucket
removeNodeFromBucket publicKey =
  KBucket . Map.delete publicKey . bucketNodes


removeNode :: PublicKey -> KBuckets -> KBuckets
removeNode publicKey kBuckets =
  updateBucketForKey kBuckets publicKey $ \bucket ->
    removeNodeFromBucket publicKey bucket

\end{code}

Iteration order of a k-buckets instance is in order of distance from the base
key.  I.e. the first node seen in iteration is the closest, and the last node
is the furthest away in terms of the distance metric.

\begin{code}

foldNodes :: (a -> NodeInfo -> a) -> a -> KBuckets -> a
foldNodes f x =
  foldl f x . concatMap (map entryNode . Map.elems . bucketNodes) . Map.elems . buckets


{-------------------------------------------------------------------------------
 -
 - :: Tests.
 -
 ------------------------------------------------------------------------------}


getAllNodes :: KBuckets -> [NodeInfo]
getAllNodes =
  concatMap (map entryNode . Map.elems . bucketNodes) . Map.elems . buckets


genKBuckets :: PublicKey -> Gen KBuckets
genKBuckets publicKey =
  foldl (flip addNode) (empty publicKey) <$> Gen.listOf arbitrary


instance Arbitrary KBuckets where
  arbitrary = arbitrary >>= genKBuckets
\end{code}