Safe Haskell	Safe-Infered

Network.URLb

Description

URL parser, following RFC 3986 (http://tools.ietf.org/html/rfc3986).

Synopsis

Documentation

data URL Source

URL "...refers to the subset of URIs that, in addition to identifying a resource, provide a means of locating the resource by describing its primary access mechanism".

A breakdown of URLs, per the diagram from RFC 3986:

     foo://example.com:8042/over/there?name=ferret#nose
     \_/   \______________/\_________/ \_________/ \__/
      |           |            |            |        |
   scheme     authority       path        query   fragment
      |   _____________________|__
     / \ /                        \
     urn:example:animal:ferret:nose

For the most part, URL parts are made of strings with percent encoding required of certain characters. The scheme is especially limited in the allowable data:

  scheme      = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )

Note well that no percent encoding is allowed.

The authority section, nominally denoting userinfo@host:port, is in fact quite flexible, allowing percent encoding for the hostname and userinfo section; only the port has a byte range restriction, to digits.

Since this datatype represents the data in a URL and not its particular encoded form, we use ByteString liberally.

Constructors

URL
Fields scheme :: Scheme authority :: Maybe Authority path :: ByteString query :: ByteString fragment :: ByteString

Instances

Eq URL
Ord URL
Show URL
IsString URL
Parse URL
Encode URL

data Authority Source

Constructors

Authority
Fields userinfo :: ByteString host :: ByteString port :: Maybe Word16

Instances

Eq Authority
Ord Authority
Show Authority
IsString Authority
Parse Authority
Encode Authority

newtype Scheme Source

Constructors

Scheme ByteString

Instances

Eq Scheme
Ord Scheme
Show Scheme
IsString Scheme
Parse Scheme
Encode Scheme

class Encode t whereSource

Class for encoding items from this module as URLs.

Methods

encode :: t -> ByteString Source

Instances

Encode Scheme
Encode Authority
Encode URL

class Parse t whereSource

Class for parsing URL-related datatypes.

Methods

parser :: Parser tSource

Instances

Parse Scheme
Parse Authority
Parse URL

userinfoOctet :: Word8 -> Bool Source

 *( unreserved / pct-encoded / sub-delims / ":" )

userinfoP :: Parser ByteString Source

regNameOctet :: Word8 -> Bool Source

 *( unreserved / pct-encoded / sub-delims )

regNameP :: Parser ByteString Source

percent :: Parser Word8 Source

pathRootlessP :: Parser ByteString Source

Paths are quite sophisticated, with 5 productions to handle the different URI contexts in which they appear. However, for the purpose of URL parsing, we can assume that paths are always separated from the authority (even the empty authority) with a / and thus can work with a relatively simple subset of the productions in the RFC.

  path-rootless = segment-nz *( "/" segment )

  ...

  segment-nz    = 1*pchar

  ...

  pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"

Although literal slash runs are not permitted by the RFC, equivalent content can be encoded with percent encoding.

segmentOctet :: Word8 -> Bool Source

To parse the authority and path:

we parse an authority and then optionally a slash and a path or
we parse a single slash and then optionally a path.

authorityPath :: Parser (Maybe Authority, ByteString)Source

queryFragmentOctet :: Word8 -> Bool Source

queryFragmentP :: Parser ByteString Source

usingOnly :: Int -> Parser t -> Parser tSource

withPercents :: (Word8 -> Bool) -> Parser ByteString Source

Parse a bytestream, accepting either literal bytes matching the predicate or any percent encoded characters.

percentEncode :: Word8 -> ByteString Source

Transform any octet to its percent encoded form.

selectiveEncode :: (Word8 -> Bool) -> ByteString -> ByteString Source

Percent encode a ByteString, ignoring octets that match the predicate.

concatNonEmpty :: ByteString -> ByteString -> ByteString Source

pathEncode :: ByteString -> ByteString Source

Slash runs are not allowed in encoded paths. Here, this is interpreted to mean that the first slash in path data, which would come after the slash separating the path and the scheme or authority, should be escaped.

fromString' :: Parse a => String -> Either String aSource

fromRight :: Either [Char] t -> tSource