| Version 4 (modified by mgruen, 3 years ago) |
|---|
Architecture overview
hackage-server is based on a modular architecture which encourages a full-fledged REST-based public interface and not just a collection of scripts.
The fundamental unit of the Hackage website is the feature. Examples of features include user registration, where users can sign up for their own accounts, reverse dependencies, calculated from the main package index, and preferred versions, which allows package maintainers to recommend some versions over others for the package installer.
Each feature keeps some kind of data, and it serves that data on the website and is responsible for its backup. Features may depend on each other and use each other's data. To install a feature, it has to be compiled into the source, and swaths of functionality can be enabled or disabled by adding or deleting a few lines of source code.
This is a work in progress. Nearly everything here was implemented this past summer (2010). 1800 words and counting.
Types
All that the server needs to know about a feature is in the HackageModule type:
data HackageModule = HackageModule {
featureName :: String,
resources :: [Resource],
dumpBackup :: Maybe (BlobStorage -> IO [BackupEntry]),
restoreBackup :: Maybe (BlobStorage -> RestoreBackup)
}
The name should be an alphabetical string, preferrably just one word, which shouldn't conflict with other features' names.
The [Resource] field is the most important one: it defines which pages the feature will serve. The backup fields, which are both optional, can be used to export and import a human-readable representation of a feature's data. This is not used for persistent state (since happstack-state is), only for periodic snapshots.
Features will probably to provide information other than those available in the HackageModule field. For that reason, the actual feature object can be any data structure whatsoever, so long as you can get a HackageModule from it. In other words, it has to implement the HackageFeature typeclass:
class HackageFeature a where
getFeature :: a -> HackageModule
initHooks :: a -> [IO ()]
The initHooks are a list of miscellaneous actions to do on start up. This could be running hooks, initializing caches, forking maintenance threads, and so on. The default implementation of this is (const []).
Each feature provides an initialization function for its own feature object. This function usually just takes the global server config and possibly other feature objects as its argument. It's up to the top-level server code to pass the initialization function the required parameters. Once all of the feature objects have been initialized, all of their initHooks are called. Then, getFeature is called on all of the feature objects, yielding a list of HackageModules. Depending on the startup mode, this list can be used to import a package tarball, export one, or start serving web pages.
Important bits of Happstack
Happstack is a web framework for Haskell. happstack-server provides utilities for responding to HTTP requests. happstack-state provides persistent storage of native Haskell datatypes.
(insert useful happstack functions here, necessary for the next section)
How to serve from a URI
The way to route a given URI to a given page is through a Resource. Each HackageModule has a list of them it expects the server to serve. Routing is done particularly through a URI specification string, similar to the kind used by Ruby on Rails, Pylons, and other web frameworks.
Consider a URI path like "/blog/post/7", which the client would access as " http://website.org/blog/post/7". It has three path components: "blog", "post", and "7". The first two are static components: say that when you're serving a blog post, the URI must always start with "/blog/post". However, the third component is dynamic, meaning that you might want to take an arbitrary number there, 7 or 12 or 81, even though not all numbers have their own blog post (in which case you'd return a 404 page, "post not found"). Knowing this, you would use the string "/blog/post/:id" to indicate the structure of the URI, and you could make a resource from it using resourceAt :: String -> Resource. Any path components starting with a colon are dynamic. Any other ones are static.
A Resource defines not only the structure of URIs it will respond to, but also the functions themselves that do the responding. It can respond to the four most common HTTP methods, GET, POST, PUT, and DELETE, using the Happstack web framework. Whenever you want to respond to an HTTP request from a given Resource, you have to provide a function of the type 'DynamicPath -> ServerPart Response'. 'ServerPart Response' is the Happstack type for a procedure in the server monad that produces a Response object. DynamicPath, however, is defined as a mapping from the names of the dynamic path components to their values. In particular, it's an association list, [(String, String)].
To define a basic blog post Resource, you would write (todo: link happstack docs):
blogPost :: Resource
blogPost = (resourceAt "/blog/post/:id") { resourceGet = [("txt", serveBlogPost)], resourcePut = [("txt", setBlogPost)]}
serveBlogPost :: !DynamicPath -> !ServerPart Response
serveBlogPost dpath = case fromReqURI =<< lookup "id" dpath of
Nothing -> notFound . toResponse $ "Invalid number"
Just pid -> do
mcontents <- query $ !LookupPost pid
case mcontents of
Nothing -> notFound . toResponse $ "Post #" ++ show pid ++ " not found"
ok . toResponse $ contents
setBlogPost :: !DynamicPath -> !ServerPart Response
setBlogPost dpath = case fromReqURI =<< lookup "id" dpath of
Nothing -> notFound . toResponse $ "Invalid number"
Just pid -> do
mcontents <- getDataFn $ look "contents"
case mcontents of
Nothing -> badRequest . toResponse $ "Bad input, couldn't find text"
Just contents -> do
update $ !SetPost pid contents
ok . toResponse $ contents
At the very top, the Resource object is created, then GET/PUT methods are added using record update notation. Both resourceGet and resourcePut expect a [(!Content, DynamicPath -> ServerPart Response)] object, where the first argument is a content-type in case multiple formats are wanted. (See formats.) They each return a text/plain response, so "txt" is used.
There's a fair amount of boilerplate processing in the samples above. To avoid this, hackage-server takes the approach of defining combinators like (...)
hackage-server amenities
The following are special things provided by hackage-server, each of which possibly deserves a section or page or two. Some of them might be split off into their own packages or merged into happstack.
- Format-generic structure. Features rarely define pages in specific formats. Instead, there are special view features for HTML, JSON, YAML, and whatever other format you can imagine, which use functions exported by other features to display data. This has a few advantages. First, all of the HTML generation is in one place, and it is trivial to switch between HTML generation engines just by switching out their features. Second, HTML often cross-cuts between many features, and the approach of combining their HTML in a big list and mashing it together makes ugly webpages.
- Format-generic failing, to be described in in above section. One of the tools used to produce error messages in any format is the MServerPart type, which is just a ServerPart (Either ErrorResponse a). With hackage-server's combinator-based approach to data retrieval, this allows a page to explain exactly why it couldn't fulfill a request, and is more elegant than throwing an exception or serving the same 404 page everywhere.
- Caches. These are just non-persistent values in memory, updateable asynchronously and atomically. Beware, they're not updated until the new value is fully evaluated. There should be more fine-grained control over their operation. (Side note: there is currently no server-side or client-side cache middleware, which need a more systematic approach than this. Last-modified would be simple if each feature just stored more timestamps, but ETags are hella complicated where multiple content-types and PUT are involved.)
- Hooks. Hooks are generally called after an update happens, and they can take any number of arguments and run an IO action. They may call other Hooks in turn, but they shouldn't take too long. They are processed in sequence, and run in the reverse order of their adding.
- Filters are generally called before an update happens, with the ability to stop the event, or inject some value into it. They can also take any number of arguments, and return a typed IO result. They use the same internal representation as Hooks, but have more specific utility functions.
- Basic and digest authentication. This authenticates an access control list against the user database using a stateless approach.
- Backup. One of the more complicated backup types is RestoreBackup?, which each feature wanting to import data should implement. (explain)
Core Hackage types
- PackageIndex
- PkgInfo
- Users
Hook summary
Presently, these are used to keep features in sync. This is important because features often keep their own (Map PackageName Blah) to mirror the central one. They're used for push-caching as well in some cases.
- CoreFeature provides a lot of them. Any features that update the core data set should call them appropriately. (Either that, or more wrapper functions which call them automatically.)
- packageAddHook :: Hook (PkgInfo -> IO ()). This is called when a package name and version is added to the index which was not there previously.
- packageRemoveHook :: Hook (PkgInfo -> IO ()). Called when a package name and version is totally wiped from the index (should be a rare occurence). It is sometimes annoying to implement removal for auxiliary indices, but it helps ensure feature consistency.
- packageChangeHook :: Hook (PkgInfo -> PkgInfo -> IO ()). Called when a package name and version is replaced from an old value (first argument) to a new one (second argument).
- packageIndexChange :: Hook (IO ()): Called whenever the index tarball needs to be changed. Dubious.
- newPackageHook :: Hook (PkgInfo -> IO ()): Called after packageAddHook only if this is the first version of this package to be put in the main index.
- noPackageHook :: Hook (PkgInfo -> IO ()). Called after packageRemoveHook only if there are no more versions of that package left in the main index.
- tarballDownload :: Hook (PackageId -> IO ()). Called whenever a package tarball is downloaded.
- UserFeature provides:
- userAdded :: Hook (IO ()). More hooks to come.
- VersionsFeature provides:
- preferredHook :: Hook (PackageName -> PreferredInfo -> IO ()). Called whenever the preferred versions for a package is updated.
- deprecatedHook :: Hook (PackageName -> Maybe [PackageName] -> IO ()). Called whenever a package is deprecated (in favor of Just pkgs) or undeprecated (Nothing).
- TagsFeature provides:
- tagsUpdated :: Hook (Set PackageName -> Set Tag -> IO ()). Called whenever tags are updated. The first argument is all packages affected and the second argument is all tags affected. In most cases one of these will be a singleton set.
- ReverseFeature provides:
- reverseUpdateHook :: Hook (Map PackageName [Version] -> IO ()). Called whenever the reverse package index is updated. It lists the packages whose revdeps index was updated, and for those packages, which versions were affected.
Filter summary
- UploadFeature provides:
- canUploadPackage :: Filter (UserId -> UploadResult -> IO (Maybe ErrorResponse)). Called before adding a package to the *main* index. It uses the UploadResult type, which previews the cabal file, GenericPackageDescription, and any upload warnings. If an Error results, the operation is aborted.
- UserFeature provides:
- packageMutate :: Filter (UserId -> IO Bool). Called whenever updating any kind of package index, even a non-main one. This can be used to let anyone use the social features, but only let approved people upload packages, which is the current system as of August 2010.
General deficiencies
- Routing is untyped (stringly typed), a common but unfortunate situation for most routing libraries which vanilla Happstack manages to avoid. This is a necessary evil when Haskell lacks inclusion polymorphism and subtyping (e.g. the Restlet approach hides some of the complexity with subclasses), but combinator functions can help alleviate it. For example, withPackageName converts a list of string path components into a PackageName, and other variants of with* functions do data querying as well.
- Server-side and client-side caching is only ad-hoc presently.
- grep -R 'TODO\|FIXME' . —and expect a screenful or two
- User groups have a messy interface. Although each feature should define and manage their own, it is still desirable to map a user to the groups they are in. Presently this is done by mapping a UserId to a bunch of URIs, and each URI to a group description.
- Too many cross-cutting Happstack queries. Features should hide Happstack queries and updates in their own wrappers, which may also call hooks and do any related maintenance.
- The system of relying on hooks and filters to keep features in sync is fragile, and it's easy to miss something and get inconsistent results. Perhaps there should be a package mapping which has the express purpose of syncing with the main index. This is one of the disadvantages of using native Haskell types, though the advantages are plenty.
- As ever, eliminate as much boilerplate as sanely possible.
