Portability	portable
Stability	experimental
Safe Haskell	None

Network.Api.Arxiv

Contents

Request
Expression Example
Queries
Response
A complete Example using http-conduit

Description

The ArXiv API is split in two parts: Request and Response. The Request part contains a simple language to define queries, a query parser and some helpers to navigate through the results of a mutlipage query (which, in fact, induces a new query).

The Response part contains an API to access the fields of the result based on TagSoup.

This library does not contain functions to actually execute and manage http requests. It is intended to be used with existing http libraries such as http-conduit. An example how to use the ArXiv library with http-conduit is included in this documentation.

Synopsis

Request

Requests are URL parameters, either "search_query" or "id_list". This module provides functions to build and parse these parameters, to create the full request string and to navigate through a multi-page request with a maximum number of items per page.

For details of the Arxiv request format, please refer to the Arxiv documentation.

baseUrl :: String Source

The Arxiv base URL "arxiv.org"

apiUrl :: String Source

The Arxiv API URL "export.arxiv.org/api"

apiQuery :: String Source

The query string ("search_query=" or "id_list=")

data Field Source

Field data type; a field consist of a field identifier (author, title, etc.) and a list of search terms.

Constructors

Ti [Term]	Title
Au [Term]	Author
Abs [Term]	Abstract
Co [Term]	Comment
Jr [Term]	Journal
Cat [Term]	Category
Rn [Term]	Report Number
Id [Term]	Article identifier
All [Term]	Any of the above

Instances

Eq Field
Show Field

data Expression Source

Expression data type. An expression is either a field or a logical connection of two expressions using the basic operators AND, OR and ANDNOT.

Constructors

Exp Field	Just a field
And Expression Expression	Logical "and"
Or Expression Expression	Logical "or"
AndNot Expression Expression	Logical "and . not"

Instances

Eq Expression
Show Expression

(/*/) :: Expression -> Expression -> Expression Source

AND operator. The symbol was chosen because 0 * 1 = 0.

(/+/) :: Expression -> Expression -> Expression Source

OR operator. The symbol was chosen because 0 + 1 = 1.

(/-/) :: Expression -> Expression -> Expression Source

ANDNOT operator. The symbol was chosen because 1 - 1 = 0 and 1 - 0 = 1.

Expression Example

Expressions are intended to ease the construction of well-formed queries in application code. A simple example:

 let au = Exp $ Au ["Knuth"]
     t1 = Exp $ Ti ["The Art of Computer Programming"]
     t2 = Exp $ Ti ["Concrete Mathematics"]
     ex = au /*/ (t1 /+/ t2)
  in ...

Queries

data Query Source

Query data type.

You usually want to create a query like:

 let e = (Exp $ Au ["Aaronson"]) /*/ (
           (Exp $ Ti ["quantum"]) /+/
           (Exp $ Ti ["complexity"]))
  in Query {
        qExp   = Just e,
        qIds   = ["0902.3175v2","1406.2858v1","0912.3825v1"],
        qStart = 0,
        qItems = 10}

Constructors

Query
Fields qExp :: Maybe Expression The query expression qIds :: [Identifier] Id List qStart :: Int The first item we want to see qItems :: Int The number of items we want to see

Instances

Eq Query
Show Query

nextPage :: Query -> Query Source

Prepares the query to fetch the next page adding "items per page" to "start index".

parseQuery :: String -> Either String Expression Source

Parses an expression from a string. Please refer to the Arxiv documentation for details on query format.

Just a minor remark here: The operators OR, AND and ANDNOT are case sensitive. "andnot" would be interpreted as part of a title, for instance: "ti:functional+andnot+object+oriented" is just one title; "ti:functional+ANDNOT+object+oriented" would cause an error, because a field identifier (ti, au, etc.) is expected after "+ANDNOT+".

The other way round: the field content itself is not case sensitive, i.e. "ti:complexity" or "au:aaronson" is the same as "ti:Complexity" and "au:Aaronson" respectively. This is a feature of the very arXiv API.

You may want to refer to the comments under preprocess and exp2str for some more details on our interpretation of the Arxiv documentation.

preprocess :: String -> String Source

This is an internal function used by parseQuery. It may be occasionally useful for direct use: It replaces " ", "(", ")" and """ by "+", "%28", "%29" and "%22" respectively.

Usually, these substitutions are performed when transforming a string to an URL, which should be done by your http library anyway (e.g. http-conduit). But this step is usually after parsing has been performed on the input string. (Considering a work flow like: parseQuery >>= mkQuery >>= parseUrl >>= execQuery.) The parser, however, accepts only the URL-encoded characters and, thus, some preprocessing may be necessary.

The other way round, this means that you may use parentheses, spaces and quotation marks instead of the URL encodings. But be careful! Do not introduce two successive spaces - we do not check for whitespace!

parseIds :: String -> [Identifier]Source

Converts a string containing comma-separated identifiers into a list of Identifiers. As stated already: No whitespace!

mkQuery :: Query -> String Source

Generates the complete query string including URL, query expression, id list and item control

exp2str :: Expression -> String Source

Create a query string from an expression. Note that we create redundant parentheses, for instance "a AND b OR c" will be encoded as "a+AND+%28b+OR+c%29". The rationale is that the API specification is not clear on how expressions are parsed. The above expression could be understood as "a AND (b OR c)" or as "(a AND b) OR c". To avoid confusion, one should always use parentheses to group boolean expressions - even if some of these parentheses appear to be redundant under one or the other parsing strategy.

items2str :: Query -> String Source

Converts the query to a string containing only the item control

ids2str :: [Identifier] -> String Source

Converts a list of Identifier to a string with comma-separated identifiers

itemControl :: Int -> Int -> String Source

Generates the item control of a query string according to first item and results per page:

Int: Start index for this page
Int: Number of results per page.

Response

Response processing expects [Tag String] as input (see TagSoup). The result produced by your http library (such as http-conduit) must be converted to [Tag String] before the result is passed to the response functions defined here (see also the example below).

The response functions extract information from the tag soup, either in String, Int or TagSoup format.

For details of the Arxiv Feed format, please refer to the Arxiv documentation.

totalResults :: [Tag String] -> Int Source

Total results of the query

startIndex :: [Tag String] -> Int Source

Start index of this page of results

itemsPerPage :: [Tag String] -> Int Source

Number of items per page

getEntry :: [Tag String] -> ([Tag String], [Tag String])Source

Get the first entry in the tag soup. The function returns a tuple of

The entry (if any)
The rest of the tag soup following the first entry.

With getEntry, we can build a loop through all entries in the result (which is actually implemented in forEachEntry).

forEachEntry :: [Tag String] -> ([Tag String] -> r) -> [r]Source

Loop through all entries in the result feed applying a function on each one. The results are returned as list. The function is similar to map with the arguments swapped (as in Foldable forM).

Arguments:

[Tag String]: The TagSoup through which we are looping
[Tag String] -> r: The function we are applying per entry; the TagSoup passed in to the function represents the current entry.

Example:

 forEachEntry soup $ \e ->
   let y = case getYear e of
             "" -> "s.a."
             x  -> x
       a = case getAuthorNames e of
             [] -> "Anonymous"
             as -> head as ++ 
    in a ++ " (" ++ y ++ ")"

Would retrieve the name of the first author and the year of publication (like Aaronson (2013)) from all entries.

forEachEntryM :: Monad m => [Tag String] -> ([Tag String] -> m r) -> m [r]Source

Variant of forEachEntry for monadic actions.

forEachEntryM_ :: Monad m => [Tag String] -> ([Tag String] -> m ()) -> m ()Source

Variant of forEachEntryM for actions that do not return a result.

checkForError :: [Tag String] -> Either String ()Source

Checks if the feed contains an error message, i.e.

it has only one entry,
the title of this entry is "Error" and
its id field contains an error message, which is returned as Left.

Apparently, this function is not necessary, since the Arxiv site returns error feeds with status code 400 (bad request), which should be handled by your http library anyway.

exhausted :: [Tag String] -> Bool Source

Checks whether the query is exhausted or not, i.e. whether all pages have been fetched already. The first argument is the entire response (not just a part of it).

getId :: [Tag String] -> String Source

Gets the article identifier as it can be used in an "id_list" query, i.e. without the URL. The [Tag String] argument is expected to be a single entry.

getIdUrl :: [Tag String] -> String Source

Gets the full contents of the id field (which contains an URL before the article identifier). The [Tag String] argument is expected to be a single entry.

getUpdated :: [Tag String] -> String Source

Gets the contents of the "update" field in this entry, i.e. the date when the article was last updated.

getPublished :: [Tag String] -> String Source

Gets the contents of the "published" field in this entry, i.e. the date when the article was last uploaded.

getYear :: [Tag String] -> String Source

Gets the year of the "published" field in this entry.

getTitle :: [Tag String] -> String Source

Gets the title of this entry.

getSummary :: [Tag String] -> String Source

Gets the summary of this entry.

getComment :: [Tag String] -> String Source

Gets author''s comment (in "arxiv:comment") of this entry.

getJournal :: [Tag String] -> String Source

Gets the journal information (in "arxiv:journal_ref") of this entry.

getDoi :: [Tag String] -> String Source

Gets the digital object identifier (in "arxiv:doi") of this entry.

data Link Source

The Link data type

Constructors

Link
Fields lnkHref :: String The hyperlink lnkType :: String The link type (a MIME type) lnkTitle :: String The link title (e.g. "pdf" would be the link where we find the article in pdf format) lnkRel :: String the link relation (e.g. "related" would point to the related information, such as the pdf document)

Instances

Eq Link
Show Link

getLinks :: [Tag String] -> [Link]Source

Gets all links in the entry.

getPdfLink :: [Tag String] -> Maybe Link Source

Gets only the pdf link of this entry (if any).

getPdf :: [Tag String] -> String Source

Gets the hyperlink to the pdf document of this entry (if any).

data Category Source

Category data type

Constructors

Category
Fields catTerm :: String The category term (e.g. "math-ph") catScheme :: String The category scheme

Instances

Eq Category
Show Category

getCategories :: [Tag String] -> [Category]Source

Gets the categories of this entry.

getPrimaryCategory :: [Tag String] -> Maybe Category Source

Gets the primary category of this entry (if any).

data Author Source

The Author data type

Constructors

Author
Fields auName :: String Author name auFil :: String Author Affiliation

Instances

Eq Author
Show Author

getAuthors :: [Tag String] -> [Author]Source

Gets the authors of this entry.

getAuthorNames :: [Tag String] -> [String]Source

Gets the names of all authors of this entry.

A complete Example using http-conduit

 module Main
 where
 
   import qualified Network.Api.Arxiv as Ax
   import           Network.Api.Arxiv (Expression(..), 
                                       Field(..), (/*/), (/+/))
   import           Network (withSocketsDo)
   import           Network.HTTP.Conduit
   import           Network.HTTP.Types.Status
   import           Data.List (intercalate)
   import qualified Data.ByteString as B hiding (unpack) 
   import qualified Data.ByteString.Char8 as B  (unpack) 
   import           Data.Conduit (($$+-), (=$), ($$))
   import qualified Data.Conduit as C
   import qualified Data.Conduit.List as CL
   import           Text.HTML.TagSoup
   import           Control.Monad.Trans (liftIO)
   import           Control.Monad.Trans.Resource (MonadResource)
   import           Control.Applicative ((<$>))
 
   main :: IO ()
   main = withSocketsDo (execQuery makeQuery)
     
   makeQuery :: Ax.Query
   makeQuery = 
     let au = Exp $ Au ["Aaronson"]
         t1 = Exp $ Ti ["quantum"]
         t2 = Exp $ Ti ["complexity"]
         x  = au /*/ (t1 /+/ t2)
      in Ax.Query {
           Ax.qExp   = Just x,
           Ax.qIds   = [],
           Ax.qStart = 0,
           Ax.qItems = 25}
 
   type Soup = Tag String
 
   execQuery :: Ax.Query -> IO ()
   execQuery q = withManager $ \m -> searchAxv m q $$ outSnk
 
   -----------------------------------------------------------------
   -- Execute query and start a source
   -----------------------------------------------------------------
   searchAxv :: MonadResource m =>
                Manager -> Ax.Query -> C.Source m String
   searchAxv m q = do
      u   <- liftIO (parseUrl $ mkQuery q)
      rsp <- http u m 
      case responseStatus rsp of
        (Status 200 _) -> getSoup rsp >>= results m q 
        st             -> error $ "Error:" ++ show st
 
   -----------------------------------------------------------------
   -- Consume page by page
   -----------------------------------------------------------------
   getSoup :: MonadResource m => 
              Response (C.ResumableSource m B.ByteString) -> m [Soup]
   getSoup rsp = concat <$> (responseBody rsp $$+- 
                                       toSoup =$ CL.consume)
 
   -----------------------------------------------------------------
   -- Receive a ByteString and yield Soup
   -----------------------------------------------------------------
   toSoup :: MonadResource m => C.Conduit B.ByteString m [Soup] 
   toSoup = C.awaitForever (C.yield . parseTags . B.unpack)
 
   ------------------------------------------------------------------
   -- Yield all entries and fetch next page
   ------------------------------------------------------------------
   results :: MonadResource m => 
              Manager -> Ax.Query -> [Soup] -> C.Source m String
   results m q sp = 
      if Ax.exhausted sp 
        then C.yield ("EOT: " ++ show (Ax.totalResults sp) ++ 
                      " results")
        else Ax.forEachEntryM_ sp (C.yield . mkResult) 
             >> searchAxv m (Ax.nextPage q)
   
   ------------------------------------------------------------------
   -- Get data and format somehow
   ------------------------------------------------------------------
   mkResult :: [Soup] -> String
   mkResult sp = let aus = Ax.getAuthorNames sp
                     y   = Ax.getYear sp
                     tmp = Ax.getTitle sp
                     ti  = if null tmp then "No title" else tmp
                  in intercalate ", " aus ++ " (" ++ y ++ "): " ++ ti
 
   ------------------------------------------------------------------
   -- Sink results 
   ------------------------------------------------------------------
   outSnk :: MonadResource m => C.Sink String m ()
   outSnk = C.awaitForever (liftIO . putStrLn)