legion-0.4.0.0: Distributed, stateful, homogeneous microservice framework.

Safe HaskellNone
LanguageHaskell2010

Network.Legion

Contents

Description

Legion is a mathematically sound framework for writing horizontally scalable user applications. Historically, horizontal scalability has been achieved via the property of statelessness. Programmers would design their applications to be free of any kind of persistent state, avoiding the problem of distributed state management. This almost never turns out to really be possible, so programmers achieve "statelessness" by delegating application state management to some kind of external, shared database -- which ends up having its own scalability problems.

In addition to scalability problems, which modern databases (especially NoSQL databases) have done a good job of solving, there is another, more fundamental problem facing these architectures: The application is not really stateless.

Legion is a Haskell framework that abstracts state partitioning, data replication, request routing, and cluster rebalancing, making it easy to implement large and robust distributed data applications.

Examples of services that rely on partitioning include ElasticSearch, Riak, DynamoDB, and others. In other words, almost all scalable databases.

Synopsis

Using Legion

Starting the Legion Runtime

While this section is being worked on, you can check out the legion-cache project for a working example of how to build a basic distributed key-value store using Legion.

forkLegionary Source

Arguments

:: (LegionConstraints i o s, MonadLoggerIO io) 
=> Legionary i o s

The user-defined legion application to run.

-> RuntimeSettings

Settings and configuration of the legionary framework.

-> StartupMode 
-> io (Runtime i o) 

Forks the legion framework in a background thread, and returns a way to send user requests to it and retrieve the responses to those requests.

data StartupMode Source

This defines the various ways a node can be spun up.

Constructors

NewCluster

Indicates that we should bootstrap a new cluster at startup. The persistence layer may be safely pre-populated because the new node will claim the entire keyspace.

JoinCluster SockAddr

Indicates that the node should try to join an existing cluster, either by starting fresh, or by recovering from a shutdown or crash.

data Runtime i o Source

This type represents a handle to the runtime environment of your Legion application. This allows you to make requests and access the partition index.

Runtime is an opaque structure. Use makeRequest to access it.

Runtime Configuration

The legion framework has several operational parameters which can be controlled using configuration. These include the address binding used to expose the cluster management service endpoint and what file to use for cluster state journaling.

data RuntimeSettings Source

Settings used when starting up the legion framework runtime.

Constructors

RuntimeSettings 

Fields

peerBindAddr :: SockAddr

The address on which the legion framework will listen for rebalancing and cluster management commands.

joinBindAddr :: SockAddr

The address on which the legion framework will listen for cluster join requests.

adminHost :: HostPreference

The host address on which the admin service should run.

adminPort :: Port

The host port on which the admin service should run.

Making Runtime Requests

makeRequest :: MonadIO io => Runtime i o -> PartitionKey -> i -> io o Source

Send a user request to the legion runtime.

search :: MonadIO io => Runtime i o -> SearchTag -> Source io IndexRecord Source

Send a search request to the legion runtime. Returns results that are strictly greater than the provided SearchTag.

Implementing a Legion Application

Whenever you use Legion to develop a distributed application, your application is going to be divided into two major parts, the stateless part, and the stateful part. The stateless part is going to be the context in which a legion node is running -- probably a web server if you are exposing your application as a web service. Legion itself is focused mainly on the stateful part, and it will do all the heavy lifting on that side of things. However, it is worth mentioning a few things about the stateless part before we move on.

The unit of state that Legion knows about is called a "partition". Each partition is identified by a PartitionKey, and it is replicated across the cluster. Each partition acts as the unit of state for handling stateful user requests which are routed to it based on the PartitionKey associated with the request. What the stateful part of Legion is not able to do is figure out what partition key is associated with the request in the first place. This is a function of the stateless part of the application. Generally speaking, the stateless part of your application is going to be responsible for

  • Starting up the Legion runtime using forkLegionary.
  • Identifying the partition key to which a request should be applied (e.g. maybe this is some component of a URL, or else an identifier stashed in a browser cookie).
  • Marshalling application requests into requests to the Legion runtime.
  • Marshalling the Legion runtime response into an application response.

Legion doesn't really address any of these things, mainly because there are already plenty of great ways to write stateless services. What Legion does provide is a runtime that can be embedded in the stateless part of your application, that transparently handles all of the hard stateful stuff, like replication, rebalancing, request routing, etc.

The only thing required to implement a legion service is to provide a request handler and a persistence layer by constructing a Legionary value and passing it to forkLegionary. The stateful part of your application will live mostly within the request handler handleRequest. If you look at handleRequest, you will see that it is abstract over the type variables i, o, and s.

handleRequest :: PartitionKey -> i -> s -> o

These are the types your application has to fill in. i stands for "input", which is the type of requests your application accepts; o stands for "output", which is the type of responses your application will generate in response to those requests, and s stands for "state", which is the application state that each partition can assume.

Implementing a request handler is pretty straight forward, but there is a little bit more to it than meets the eye. If you look at forkLegionary, you will see a constraint named LegionConstraints i o s, which is short-hand for a long list of typeclasses that your i, o, and s types are going to have to implement. Of particular interest is the ApplyDelta typeclass. If you look at handleRequest, you will see that it is defined in terms of an input, an existing state, and an output, but there is no mention of any new state that is generated as a result of handling the request.

This is where the ApplyDelta typeclass comes in. Where handleRequest takes an input and a state and produces an output, the apply function of the ApplyDelta typeclass takes an input and a state and produces a new state.

apply :: i -> s -> s

The reason that Legion splits the definition of what it means to fully handle an input into two functions like this is because of the approach it takes to solving distributed systems problems. Describing this entirely is beyond the scope of this section of documentation (TODO link to more info) but the TL;DR is that handleRequest will only get called once for each input, but apply has a very good chance of being called more than once for various reasons including re-playing the application of requests to resolve non-determinism.

Taking yet another look at handleRequest, you will see that it makes no provision for a non-existent partition state (i.e., it is written in terms of s, not Maybe s. Same goes for ApplyDelta). This framework takes the somewhat platonic philosophical view that all mathematical values exist somewhere and that there is no such thing as non-existent partition. When you first spin up a Legion application, all of those partitions are going to have a default value, which is def (Because your partition state must be an instance of the Default typeclass). This doesn't take up infinite disk space because def values are cleverly encoded as a zero-length string of bytes. ;-)

The persistence layer provides the framework with a way to store the various partition states. This allows you to choose any number of persistence strategies, including only in memory, on disk, or in some external database.

See newMemoryPersistence and diskPersistence if you need to get started quickly with an in-memory persistence layer.

Indexing

Legion gives you a way to index your partitions so that you can find partitions that have certain characteristics without having to know the partition key a priori. Conceptually, the "index" is a single, global, ordered list of IndexRecords. The search function allows you to scroll forward through this list at will.

Each partition may generate zero or more IndexRecords. This is determined by the index function, which is defined by your specific Legion application. For each Tag returned by index, an IndexRecord is generated such that:

@IndexRecord {irTag = <your tag>, irKey = <partition key>}@

data Legionary i o s Source

This is the type of a user-defined Legion application. Implement this and allow the Legion framework to manage your cluster.

  • i is the type of request your application will handle. i stands for "input".
  • o is the type of response produced by your application. o stands for "output"
  • s is the type of state maintained by your application. More precisely, it is the type of the individual partitions that make up your global application state. s stands for "state".

Constructors

Legionary 

Fields

handleRequest :: PartitionKey -> i -> s -> o

The request handler, implemented by the user to service requests.

Given a request and a state, returns a response to the request.

persistence :: Persistence i s

The user-defined persistence layer implementation.

index :: s -> Set Tag

A way of indexing partitions so that they can be found without knowing the partition key. An index entry for the partition will be created under each of the tags returned by this function.

type LegionConstraints i o s = (ApplyDelta i s, Default s, Binary i, Binary o, Binary s, Show i, Show o, Show s, Eq i) Source

This is a more convenient way to write the somewhat unwieldy set of constraints

(
  ApplyDelta i s, Default s, Binary i, Binary o, Binary s, Show i,
  Show o, Show s, Eq i
)

data Persistence i s Source

The type of a user-defined persistence strategy used to persist partition states. See newMemoryPersistence or diskPersistence if you need to get started quickly.

Constructors

Persistence 

Fields

getState :: PartitionKey -> IO (Maybe (PartitionPowerState i s))
 
saveState :: PartitionKey -> Maybe (PartitionPowerState i s) -> IO ()
 
list :: Source IO (PartitionKey, PartitionPowerState i s)

List all the keys known to the persistence layer. It is important that the implementation do the right thing with regard to addCleanup, because there are cases where the conduit is terminated without reading the entire list.

class ApplyDelta i s where Source

The class which allows for delta application.

Methods

apply :: i -> s -> s Source

Apply a delta to a state value. *This function MUST be total!!!*

newtype Tag Source

A tag is a value associated with a partition state that can be used to look up a partition key.

Constructors

Tag 

Fields

unTag :: ByteString
 

Other Types

data SearchTag Source

This data structure describes where in the index to start scrolling.

Constructors

SearchTag 

data IndexRecord Source

This data structure describes a record in the index.

Constructors

IndexRecord 

Fields

irTag :: Tag
 
irKey :: PartitionKey
 

newtype PartitionKey Source

This is how partitions are identified and referenced.

Constructors

K 

Fields

unKey :: Word256
 

data PartitionPowerState i s Source

This is an opaque representation of your application's partition state. Internally, this represents the complete, nondeterministic set of states the partition can be in as a result of concurrency, eventual consistency, and all the other distributed systems reasons your partition state might have more than one value.

You can save these guys to disk in your Persistence layer by using its Binary instance.

Utils

newMemoryPersistence :: IO (Persistence i s) Source

A convenient memory-based persistence layer. Good for testing or for applications (like caches) that don't have durability requirements.

diskPersistence Source

Arguments

:: (Binary i, Binary s) 
=> FilePath

The directory under which partition states will be stored.

-> Persistence i s 

A convenient way to persist partition states to disk.