Changes between Version 7 and Version 8 of HackageDB/2.0/Architecture

Show
Ignore:
Timestamp:
08/16/10 11:20:14 (3 years ago)
Author:
mgruen
Comment:

organize architecture page

Legend:

Unmodified
Added
Removed
Modified
  • HackageDB/2.0/Architecture

    v7 v8  
    1 == Architecture overview == 
    21hackage-server is based on a modular architecture which encourages a 
    32full-fledged REST-based public interface and not just a collection of scripts. 
     
    1514lines of source code. 
    1615 
    17 This is a work in progress. Nearly everything here was 
    18 implemented this past summer (2010). 
    19  
    20 == Types == 
    21 All that the server needs to know about a feature is in the !HackageModule type: 
    22 {{{ 
    23 data HackageModule = HackageModule { 
    24     featureName   :: String, 
    25     resources     :: [Resource], 
    26     dumpBackup    :: Maybe (BlobStorage -> IO [BackupEntry]), 
    27     restoreBackup :: Maybe (BlobStorage -> RestoreBackup) 
    28 } 
    29 }}} 
    30  
    31 The name should be an alphabetical string, preferrably just one word, which 
    32 shouldn't conflict with other features' names. 
    33  
    34 The [Resource] field is the most important one: it defines which pages the 
    35 feature will serve. The backup fields, which are both optional, can be used 
    36 to export and import a human-readable representation of a feature's data. This 
    37 is not used for persistent state (since happstack-state is), only for periodic 
    38 snapshots. 
    39  
    40 Features will probably to provide additional information beside's than what's available in the 
    41 !HackageModule field. For that reason, the actual feature object can be any 
    42 data structure whatsoever, so long as you can get a !HackageModule from it. 
    43 In other words, it has to implement the !HackageFeature typeclass: 
     16If you'd like to learn about the internals of the server and extend it: 
     17 * [wiki:HackageDB/2.0/Architecture/Types]: the core server types 
     18 * [wiki:HackageDB/2.0/Architecture/Happstack]: a primer for Happstack usage 
     19 * [wiki:HackageDB/2.0/Architecture/Resource]: Hackage's routing system 
     20 * [wiki:HackageDB/2.0/Architecture/Extras]: extra functionality the server uses 
     21 * [wiki:HackageDB/2.0/Architecture/Hook]: summary of hooks and filters 
     22 * [wiki:HackageDB/2.0/Architecture/Backup]: implement backup/restore functionality 
    4423 
    4524{{{ 
    46 class HackageFeature a where 
    47     getFeature :: a -> HackageModule 
    48     initHooks  :: a -> [IO ()] 
    49 }}} 
     25#!comment 
     26Some deficiencies - not sure where to put these 
    5027 
    51 (The initHooks are a list of miscellaneous actions to do on start up. This could 
    52 be running hooks, initializing caches, forking maintenance threads, and so on. 
    53 The default implementation of this is (const []).) 
    54  
    55 Each feature provides an 
    56 initialization function for its own feature object. This function usually 
    57 just takes the global server config and possibly other feature objects as 
    58 its argument. It's up to the top-level server code to pass the initialization 
    59 function the required parameters. Once all of the feature objects have been 
    60 initialized, all of their initHooks are called. Then, getFeature is called on 
    61 all of the feature objects, yielding 
    62 a list of !HackageModules. Depending on the startup mode, this list can be used 
    63 to import a package tarball, export one, or start serving web pages. 
    64  
    65 == Important bits of Happstack == 
    66 Happstack is a web framework for Haskell. happstack-server provides utilities 
    67 for responding to HTTP requests. happstack-state provides persistent storage 
    68 of native Haskell datatypes.  
    69  
    70 (insert useful happstack functions here, necessary for the next section) 
    71  
    72 == How to serve from a URI == 
    73 The way to route a given URI to a given page is through a Resource. Each 
    74 !HackageModule has a list of them it expects the server to serve. Routing 
    75 is done particularly through a URI specification string, similar to the kind 
    76 used by Ruby on Rails, Pylons, and other web frameworks. 
    77  
    78 Consider a URI path like "/blog/post/7", which the client would access as "http://website.org/blog/post/7". 
    79 It has three path components: "blog", "post", and "7". The 
    80 first two are static components: say that when you're serving a blog post, the 
    81 URI must always start with "/blog/post". However, the third component is 
    82 dynamic, meaning that you might want to take an arbitrary number there, 7 or 
    83 12 or 81, even though not all numbers have their own blog post (in which 
    84 case you'd return a 404 page, "post not found"). Knowing this, you would 
    85 use the string "/blog/post/:id" to indicate the structure of the URI, 
    86 and you could make a resource from it using resourceAt :: String -> Resource. 
    87 Any path components starting with a colon are dynamic. Any other ones are static. 
    88  
    89 A Resource defines not only the structure of URIs it will respond to, but also 
    90 the functions themselves that do the responding. It can respond to the four most common HTTP methods, 
    91 GET, POST, PUT, and DELETE, using the Happstack web framework. Whenever you 
    92 want to respond to an HTTP request from a given Resource, you have to provide 
    93 a function of the type '!DynamicPath -> !ServerPart Response'. '!ServerPart Response' 
    94 is the Happstack type for a procedure in the server monad that produces a Response 
    95 object. !DynamicPath, however, is defined as a mapping from the names of the dynamic 
    96 path components to their values. In particular, it's an association list, 
    97 [(String, String)]. 
    98  
    99 To define a basic blog post Resource, you would write: 
    100  
    101 {{{ 
    102 blogPost :: Resource 
    103 blogPost = (resourceAt "/blog/post/:id") { resourceGet = [("txt", serveBlogPost)], resourcePut = [("txt", setBlogPost)]} 
    104  
    105 serveBlogPost :: DynamicPath -> ServerPart Response 
    106 serveBlogPost dpath = case fromReqURI =<< lookup "id" dpath of 
    107     Nothing  -> notFound . toResponse $ "Invalid number" 
    108     Just pid -> do 
    109         mcontents <- query $ LookupPost pid 
    110         case mcontents of 
    111             Nothing -> notFound . toResponse $ "Post #" ++ show pid ++ " not found" 
    112         ok . toResponse $ contents 
    113  
    114 setBlogPost :: DynamicPath -> ServerPart Response 
    115 setBlogPost dpath = case fromReqURI =<< lookup "id" dpath of 
    116     Nothing -> notFound . toResponse $ "Invalid number" 
    117     Just pid -> do 
    118         mcontents <- getDataFn $ look "contents" 
    119         case mcontents of 
    120             Nothing -> badRequest . toResponse $ "Bad input, couldn't find text" 
    121             Just contents -> do 
    122                 update $ SetPost pid contents 
    123                 ok . toResponse $ contents 
    124 }}} 
    125  
    126 At the very top, the Resource object is created, then GET/PUT methods are added 
    127 using record update notation. Both resourceGet and resourcePut expect a 
    128 [(Content, !DynamicPath -> !ServerPart Response)] object, where the first argument 
    129 is a content-type in case multiple formats are wanted. (See formats.) They 
    130 each return a text/plain response, so "txt" is used. 
    131  
    132 There's a fair amount of boilerplate processing in the samples above. To avoid 
    133 this, hackage-server takes the approach of defining combinators like (...) 
    134  
    135 == hackage-server amenities == 
    136 The following are special things provided by hackage-server, each of which possibly deserves a section or page or two. Some of them might be split off into their own packages or merged into happstack. 
    137  
    138  * Format-generic structure. Features rarely define pages in specific formats. Instead, there are special view features for HTML, JSON, YAML, and whatever other format you can imagine, which use functions exported by other features to display data. This has a few advantages. First, all of the HTML generation is in one place, and it is trivial to switch between HTML generation engines just by switching out their features. Second, HTML often cross-cuts between many features, and the approach of combining their HTML in a big list and mashing it together makes ugly webpages. 
    139  * Format-generic failing, to be described in in above section. One of the tools used to produce error messages in any format is the MServerPart type, which is just a !ServerPart (Either !ErrorResponse a). With hackage-server's combinator-based approach to data retrieval, this allows a page to explain exactly why it couldn't fulfill a request, and is more elegant than throwing an exception or serving the same 404 page everywhere. 
    140  * Caches. These are just non-persistent values in memory, updateable asynchronously and atomically. Beware, they're not updated until the new value is fully evaluated. There should be more fine-grained control over their operation. (Side note: there is currently no server-side or client-side cache middleware, which need a more systematic approach than this. Last-modified would be simple if each feature just stored more timestamps, but ETags are hella complicated where multiple content-types and PUT are involved.) 
    141  * Hooks. Hooks are generally called after an update happens, and they can take any number of arguments and run an IO action. They may call other Hooks in turn, but they shouldn't take too long. They are processed in sequence, and run in the reverse order of their adding. 
    142  * Filters are generally called before an update happens, with the ability to stop the event, or inject some value into it. They can also take any number of arguments, and return a typed IO result. They use the same internal representation as Hooks, but have more specific utility functions. 
    143  * Basic and digest authentication. This authenticates an access control list against the user database using stateless HTTP. 
    144  * Backup. One of the more complicated backup types is !RestoreBackup, which each feature wanting to import data should implement. (explain) 
    145  
    146 == Core Hackage types == 
    147  * !PackageIndex 
    148  * !PkgInfo 
    149  * Users 
    150  
    151 == Hook summary == 
    152 Presently, these are used to keep features in sync. This is important because features often keep their own (Map !PackageName Blah) to mirror the central one. They're used for push-caching as well in some cases. 
    153  
    154  * !CoreFeature provides a lot of them. Any features that update the core data set should call them appropriately. (Either that, or more wrapper functions which call them automatically.) 
    155    * packageAddHook :: Hook (!PkgInfo -> IO ()). This is called when a package name and version is added to the index which was not there previously. 
    156    * packageRemoveHook :: Hook (!PkgInfo -> IO ()). Called when a package name and version is totally wiped from the index (should be a rare occurence). It is sometimes annoying to implement removal for auxiliary indices, but it helps ensure feature consistency. 
    157    * packageChangeHook :: Hook (!PkgInfo -> !PkgInfo -> IO ()). Called when a package name and version is replaced from an old value (first argument) to a new one (second argument). 
    158    * packageIndexChange :: Hook (IO ()): Called whenever the index tarball needs to be changed. Dubious. 
    159    * newPackageHook :: Hook (!PkgInfo -> IO ()): Called after packageAddHook only if this is the first version of this package to be put in the main index. 
    160    * noPackageHook :: Hook (!PkgInfo -> IO ()). Called after packageRemoveHook only if there are no more versions of that package left in the main index. 
    161    * tarballDownload :: Hook (!PackageId -> IO ()). Called whenever a package tarball is downloaded. 
    162  * !UserFeature provides: 
    163    * userAdded :: Hook (IO ()). More hooks to come. 
    164  * !VersionsFeature provides: 
    165    * preferredHook :: Hook (!PackageName -> !PreferredInfo -> IO ()). Called whenever the preferred versions for a package is updated. 
    166    * deprecatedHook :: Hook (!PackageName -> Maybe [!PackageName] -> IO ()). Called whenever a package is deprecated (in favor of Just pkgs) or undeprecated (Nothing). 
    167  * !TagsFeature provides: 
    168    * tagsUpdated :: Hook (Set !PackageName -> Set Tag -> IO ()). Called whenever tags are updated. The first argument is all packages affected and the second argument is all tags affected. In most cases one of these will be a singleton set. 
    169  * !ReverseFeature provides: 
    170    * reverseUpdateHook :: Hook (Map !PackageName [Version] -> IO ()). Called whenever the reverse package index is updated. It lists the packages whose revdeps index was updated, and for those packages, which versions were affected. 
    171  
    172 == Filter summary == 
    173  * !UploadFeature provides: 
    174    * canUploadPackage :: Filter (!UserId -> !UploadResult -> IO (Maybe !ErrorResponse)). Called before adding a package to the *main* index. It uses the !UploadResult type, which previews the cabal file, !GenericPackageDescription, and any upload warnings. If an Error results, the operation is aborted. 
    175  * !UserFeature provides: 
    176    * packageMutate :: Filter (!UserId -> IO Bool). Called whenever updating any kind of package index, even a non-main one. This can be used to let anyone use the social features, but only let approved people upload packages, which is the current system as of August 2010. 
    177  
    178 == General deficiencies == 
    17928 * Routing is untyped (stringly typed), a common but unfortunate situation for most routing libraries which vanilla Happstack manages to avoid. This is a necessary evil when Haskell lacks inclusion polymorphism and subtyping (e.g. the Restlet approach [http://www.restlet.org/documentation/1.1/tutorial#part11 hides some of the complexity] with subclasses), but combinator functions can help alleviate it. For example, withPackageName converts a list of string path components into a !PackageName, and other variants of with* functions do data querying as well. 
    18029 * Server-side and client-side caching is only ad-hoc presently. 
    181  * grep -R 'TODO\|FIXME' .  —and expect a screenful or two 
     30 * grep -R 'TODO\|FIXME'; look at the TODO file 
    18231 * User groups have a messy interface. Although each feature should define and manage their own, it is still desirable to map a user to the groups they are in. Presently this is done by mapping a !UserId to a bunch of URIs, and each URI to a group description. 
    18332 * Too many cross-cutting Happstack queries. Features should hide Happstack queries and updates in their own wrappers, which may also call hooks and do any related maintenance. 
    18433 * The system of relying on hooks and filters to keep features in sync is fragile, and it's easy to miss something and get inconsistent results. Perhaps there should be a package mapping which has the express purpose of syncing with the main index. This is one of the disadvantages of using native Haskell types, though the advantages are plenty. 
    18534 * As ever, eliminate as much boilerplate as sanely possible. 
     35}}}