Safe Haskell | None |
---|---|
Language | Haskell2010 |
Legion is a mathematically sound framework for writing horizontally scalable user applications. Historically, horizontal scalability has been achieved via the property of statelessness. Programmers would design their applications to be free of any kind of persistent state, avoiding the problem of distributed state management. This almost never turns out to really be possible, so programmers achieve "statelessness" by delegating application state management to some kind of external, shared database -- which ends up having its own scalability problems.
In addition to scalability problems, which modern databases (especially NoSQL databases) have done a good job of solving, there is another, more fundamental problem facing these architectures: The application is not really stateless.
Legion is a Haskell framework that abstracts state partitioning, data replication, request routing, and cluster rebalancing, making it easy to implement large and robust distributed data applications.
Examples of services that rely on partitioning include ElasticSearch, Riak, DynamoDB, and others. In other words, almost all scalable databases.
- forkLegionary :: (LegionConstraints i o s, MonadLoggerIO io) => Legionary i o s -> RuntimeSettings -> StartupMode -> io (Runtime i o)
- data StartupMode
- data Runtime i o
- data RuntimeSettings = RuntimeSettings {}
- makeRequest :: MonadIO io => Runtime i o -> PartitionKey -> i -> io o
- search :: MonadIO io => Runtime i o -> SearchTag -> Source io IndexRecord
- data Legionary i o s = Legionary {
- handleRequest :: i -> s -> o
- persistence :: Persistence i s
- index :: s -> Set Tag
- type LegionConstraints i o s = (ApplyDelta i s, Default s, Binary i, Binary o, Binary s, Show i, Show o, Show s, Eq i)
- data Persistence i s = Persistence {
- getState :: PartitionKey -> IO (Maybe (PartitionPowerState i s))
- saveState :: PartitionKey -> Maybe (PartitionPowerState i s) -> IO ()
- list :: Source IO (PartitionKey, PartitionPowerState i s)
- class ApplyDelta i s where
- apply :: i -> s -> s
- newtype Tag = Tag {
- unTag :: ByteString
- data SearchTag = SearchTag {
- stTag :: Tag
- stKey :: Maybe PartitionKey
- data IndexRecord = IndexRecord {
- irTag :: Tag
- irKey :: PartitionKey
- newtype PartitionKey = K {}
- data PartitionPowerState i s
- newMemoryPersistence :: IO (Persistence i s)
- diskPersistence :: (Binary i, Binary s) => FilePath -> Persistence i s
Using Legion
Starting the Legion Runtime
While this section is being worked on, you can check out the legion-discovery project for an example of a stateful web services that advantage of Legion's ability to define your own operations on your data. Take a look at `Network.Legion.Discovery.App` to see where the magic of defining a Legion application happens. The rest of the code is mostly just standard HTTP-interface-written-in-Haskell, and requests sent to the Legion runtime.
:: (LegionConstraints i o s, MonadLoggerIO io) | |
=> Legionary i o s | The user-defined legion application to run. |
-> RuntimeSettings | Settings and configuration of the legionary framework. |
-> StartupMode | |
-> io (Runtime i o) |
Forks the legion framework in a background thread, and returns a way to send user requests to it and retrieve the responses to those requests.
data StartupMode Source
This defines the various ways a node can be spun up.
NewCluster | Indicates that we should bootstrap a new cluster at startup. The persistence layer may be safely pre-populated because the new node will claim the entire keyspace. |
JoinCluster SockAddr | Indicates that the node should try to join an existing cluster, either by starting fresh, or by recovering from a shutdown or crash. |
This type represents a handle to the runtime environment of your Legion application. This allows you to make requests and access the partition index.
Runtime
is an opaque structure. Use makeRequest
to access it.
Runtime Configuration
The legion framework has several operational parameters which can be controlled using configuration. These include the address binding used to expose the cluster management service endpoint and what file to use for cluster state journaling.
data RuntimeSettings Source
Settings used when starting up the legion framework runtime.
RuntimeSettings | |
|
Making Runtime Requests
makeRequest :: MonadIO io => Runtime i o -> PartitionKey -> i -> io o Source
Send a user request to the legion runtime.
search :: MonadIO io => Runtime i o -> SearchTag -> Source io IndexRecord Source
Send a search request to the legion runtime. Returns results that are
strictly greater than the provided SearchTag
.
Implementing a Legion Application
Whenever you use Legion to develop a distributed application, your application is going to be divided into two major parts, the stateless part, and the stateful part. The stateless part is going to be the context in which a legion node is running -- probably a web server if you are exposing your application as a web service. Legion itself is focused mainly on the stateful part, and it will do all the heavy lifting on that side of things. However, it is worth mentioning a few things about the stateless part before we move on.
The unit of state that Legion knows about is called a "partition". Each
partition is identified by a PartitionKey
, and it is replicated across
the cluster. Each partition acts as the unit of state for handling
stateful user requests which are routed to it based on the PartitionKey
associated with the request. What the stateful part of Legion is
not able to do is figure out what partition key is associated with
the request in the first place. This is a function of the stateless
part of the application. Generally speaking, the stateless part of
your application is going to be responsible for
- Starting up the Legion runtime using
forkLegionary
. - Identifying the partition key to which a request should be applied (e.g. maybe this is some component of a URL, or else an identifier stashed in a browser cookie).
- Marshalling application requests into requests to the Legion runtime.
- Marshalling the Legion runtime response into an application response.
Legion doesn't really address any of these things, mainly because there are already plenty of great ways to write stateless services. What Legion does provide is a runtime that can be embedded in the stateless part of your application, that transparently handles all of the hard stateful stuff, like replication, rebalancing, request routing, etc.
The only thing required to implement a legion service is to
provide a request handler and a persistence layer by constructing a
Legionary
value and passing it to forkLegionary
. The stateful
part of your application will live mostly within the request handler
handleRequest
. If you look at handleRequest
, you will see that
it is abstract over the type variables i
, o
, and s
.
handleRequest :: PartitionKey -> i -> s -> o
These are the types your application has to fill in. i
stands for
"input", which is the type of requests your application accepts; o
stands for "output", which is the type of responses your application
will generate in response to those requests, and s
stands for "state",
which is the application state that each partition can assume.
Implementing a request handler is pretty straight forward, but
there is a little bit more to it than meets the eye. If you look at
forkLegionary
, you will see a constraint named
, which is short-hand for a long list of typeclasses that
your LegionConstraints
i o si
, o
, and s
types are going to have to implement. Of
particular interest is the ApplyDelta
typeclass. If you look at
handleRequest
, you will see that it is defined in terms of an input,
an existing state, and an output, but there is no mention of any new
state that is generated as a result of handling the request.
This is where the ApplyDelta
typeclass comes in. Where handleRequest
takes an input and a state and produces an output, the apply
function
of the ApplyDelta
typeclass takes an input and a state and produces
a new state.
apply :: i -> s -> s
The reason that Legion splits the definition of what it means to
fully handle an input into two functions like this is because of the
approach it takes to solving distributed systems problems. Describing
this entirely is beyond the scope of this section of documentation
(TODO link to more info) but the TL;DR is that handleRequest
will
only get called once for each input, but apply
has a very good
chance of being called more than once for various reasons including
re-playing the application of requests to resolve non-determinism.
Taking yet another look at handleRequest
, you will see that it
makes no provision for a non-existent partition state (i.e., it is
written in terms of s
, not Maybe s
. Same goes for ApplyDelta
).
This framework takes the somewhat platonic philosophical view that all
mathematical values exist somewhere and that there is no such thing as
non-existent partition. When you first spin up a Legion application,
all of those partitions are going to have a default value, which is
def
(Because your partition state must be an
instance of the Default
typeclass). This doesn't
take up infinite disk space because def
values
are cleverly encoded as a zero-length string of bytes. ;-)
The persistence layer provides the framework with a way to store the various partition states. This allows you to choose any number of persistence strategies, including only in memory, on disk, or in some external database.
See newMemoryPersistence
and diskPersistence
if you need to get
started quickly with an in-memory persistence layer.
Indexing
Legion gives you a way to index your partitions so that you can find
partitions that have certain characteristics without having to know
the partition key a priori. Conceptually, the "index" is a single,
global, ordered list of IndexRecord
s. The search
function allows
you to scroll forward through this list at will.
Each partition may generate zero or more IndexRecord
s. This
is determined by the index
function, which is defined by your
specific Legion application. For each Tag
returned by index
, an
IndexRecord
is generated such that:
@IndexRecord {irTag = <your tag>, irKey = <partition key>}@
This is the type of a user-defined Legion application. Implement this and allow the Legion framework to manage your cluster.
i
is the type of request your application will handle.i
stands for "input".o
is the type of response produced by your application.o
stands for "output"s
is the type of state maintained by your application. More precisely, it is the type of the individual partitions that make up your global application state.s
stands for "state".
Legionary | |
|
type LegionConstraints i o s = (ApplyDelta i s, Default s, Binary i, Binary o, Binary s, Show i, Show o, Show s, Eq i) Source
This is a more convenient way to write the somewhat unwieldy set of constraints
( ApplyDelta i s, Default s, Binary i, Binary o, Binary s, Show i, Show o, Show s, Eq i )
data Persistence i s Source
The type of a user-defined persistence strategy used to persist
partition states. See newMemoryPersistence
or
diskPersistence
if you need to get started quickly.
Persistence | |
|
class ApplyDelta i s where Source
The class which allows for delta application.
A tag is a value associated with a partition state that can be used to look up a partition key.
Tag | |
|
Other Types
This data structure describes where in the index to start scrolling.
data IndexRecord Source
This data structure describes a record in the index.
IndexRecord | |
|
newtype PartitionKey Source
This is how partitions are identified and referenced.
data PartitionPowerState i s Source
This is an opaque representation of your application's partition state. Internally, this represents the complete, nondeterministic set of states the partition can be in as a result of concurrency, eventual consistency, and all the other distributed systems reasons your partition state might have more than one value.
You can save these guys to disk in your Persistence
layer by using its Binary
instance.
(Show i, Show s) => Show (PartitionPowerState i s) Source | |
(Binary i, Binary s) => Binary (PartitionPowerState i s) Source |
Utils
newMemoryPersistence :: IO (Persistence i s) Source
A convenient memory-based persistence layer. Good for testing or for applications (like caches) that don't have durability requirements.
:: (Binary i, Binary s) | |
=> FilePath | The directory under which partition states will be stored. |
-> Persistence i s |
A convenient way to persist partition states to disk.