| Safe Haskell | Safe-Inferred |
|---|---|
| Language | GHC2021 |
MarkupParse
Description
Synopsis
- data Markup = Markup {
- standard :: Standard
- markupTree :: [Tree Token]
- data Standard
- markup :: Standard -> ByteString -> These [MarkupWarning] Markup
- markup_ :: Standard -> ByteString -> Markup
- data RenderStyle
- markdown :: RenderStyle -> Markup -> ByteString
- normalize :: Markup -> Markup
- wellFormed :: Markup -> [MarkupWarning]
- isWellFormed :: Markup -> Bool
- data MarkupWarning
- type Result a = These [MarkupWarning] a
- resultError :: Result a -> a
- resultEither :: Result a -> Either [MarkupWarning] a
- resultMaybe :: Result a -> Maybe a
- type TagName = ByteString
- name :: Standard -> Parser e ByteString
- selfClosers :: [TagName]
- type AttrName = ByteString
- type AttrValue = ByteString
- data Attr = Attr !AttrName !AttrValue
- attrs :: Standard -> Parser a [Attr]
- data Token
- = StartTag !TagName ![Attr]
- | EmptyElemTag !TagName ![Attr]
- | EndTag !TagName
- | Content !ByteString
- | Comment !ByteString
- | Decl !ByteString
- | Doctype !ByteString
- tokenize :: Standard -> ByteString -> These [MarkupWarning] [Token]
- tokenize_ :: Standard -> ByteString -> [Token]
- token :: Standard -> Parser String Token
- detokenize :: Standard -> Token -> ByteString
- gather :: Standard -> [Token] -> These [MarkupWarning] [Tree Token]
- gather_ :: Standard -> [Token] -> [Tree Token]
- degather :: Markup -> These [MarkupWarning] [Token]
- degather_ :: Markup -> [Token]
- xmlVersionInfo :: Parser e ByteString
- xmlEncodingDecl :: Parser e ByteString
- xmlStandalone :: Parser e ByteString
- xmlVersionNum :: Parser e ByteString
- xmlEncName :: Parser e ByteString
- xmlYesNo :: Parser e ByteString
Documentation
import MarkupParse import Data.ByteString qualified as B bs <- B.readFile "other/line.svg" m = markup_ bs
is an approximate round trip from markdown . 'markup_ByteString to Markup back to ByteString'. The underscores represent versions of main functions that throw an exception on warnings encountered along the way.
At a lower level, a round trip pipeline might look something like:
:t tokenize Html >=> gather Html >>> fmap (Markup Html >>> normalize) >=> degather >>> fmap (fmap (detokenize Html) >>> mconcat) ByteString -> These [MarkupWarning] ByteString
From left to right:
tokenizeconverts aByteStringto aTokenlist,gathertakes the tokens and gathers them intoTrees of tokens- this is then wrapped into the
Markupdata type. normalizeconcatenates content, and normalizes attributes,degatherturns the markup tree back into a token list. Finally,detokenizeturns a token back into a bytestring.
Along the way, the kleisi fishies and compose forward usage accumulates any warnings via the These monad instance.
Markup
>>>markup Html "<foo class=\"bar\">baz</foo>"That (Markup {standard = Html, markupTree = [Node {rootLabel = StartTag "foo" [Attr "class" "bar"], subForest = [Node {rootLabel = Content "baz", subForest = []}]}]})
Instances
| Generic Markup Source # | |
| Show Markup Source # | |
| NFData Markup Source # | |
Defined in MarkupParse | |
| Eq Markup Source # | |
| Ord Markup Source # | |
| ToExpr Markup Source # | |
Defined in MarkupParse | |
| type Rep Markup Source # | |
Defined in MarkupParse type Rep Markup = D1 ('MetaData "Markup" "MarkupParse" "markup-parse-0.0.0.2-HnQ2lGo9uJcI0zQ43aTniK" 'False) (C1 ('MetaCons "Markup" 'PrefixI 'True) (S1 ('MetaSel ('Just "standard") 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy) (Rec0 Standard) :*: S1 ('MetaSel ('Just "markupTree") 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy) (Rec0 [Tree Token]))) | |
From a parsing pov, Html & Xml (& Svg) are close enough that they share a lot of parsing logic, so that parsing and printing just need some tweaking.
The xml parsing logic is based on the XML productions found in https://www.w3.org/TR/xml/
The html parsing was based on a reading of html-parse, but ignores the various 'x00' to 'xfffd' & eof directives that form part of the html standards.
markup :: Standard -> ByteString -> These [MarkupWarning] Markup Source #
Convert bytestrings to Markup
>>>markup Html "<foo><br></foo><baz"These [MarkupParser (ParserLeftover "<baz")] (Markup {standard = Html, markupTree = [Node {rootLabel = StartTag "foo" [], subForest = [Node {rootLabel = StartTag "br" [], subForest = []}]}]})
data RenderStyle Source #
Indented 0 puts newlines in between the tags.
Instances
| Generic RenderStyle Source # | |
Defined in MarkupParse Associated Types type Rep RenderStyle :: Type -> Type # | |
| Show RenderStyle Source # | |
Defined in MarkupParse Methods showsPrec :: Int -> RenderStyle -> ShowS # show :: RenderStyle -> String # showList :: [RenderStyle] -> ShowS # | |
| Eq RenderStyle Source # | |
Defined in MarkupParse | |
| type Rep RenderStyle Source # | |
Defined in MarkupParse type Rep RenderStyle = D1 ('MetaData "RenderStyle" "MarkupParse" "markup-parse-0.0.0.2-HnQ2lGo9uJcI0zQ43aTniK" 'False) (C1 ('MetaCons "Compact" 'PrefixI 'False) (U1 :: Type -> Type) :+: C1 ('MetaCons "Indented" 'PrefixI 'False) (S1 ('MetaSel ('Nothing :: Maybe Symbol) 'NoSourceUnpackedness 'NoSourceStrictness 'DecidedLazy) (Rec0 Int))) | |
markdown :: RenderStyle -> Markup -> ByteString Source #
Convert Markup to bytestrings
>>>B.putStr $ markdown (Indented 4) (markup_ Html [i|<foo><br></foo>|])<foo> <br> </foo>
normalize :: Markup -> Markup Source #
concatenate sequential content, and normalize attributes; unwording class values and removing duplicate attributes (taking last).
>>>B.putStr $ markdown Compact $ normalize (markup_ Xml [i|<foo class="a" class="b" bar="first" bar="last"/>|])<foo bar="last" class="a b"/>
wellFormed :: Markup -> [MarkupWarning] Source #
Check for well-formedness and rerturn warnings encountered.
>>>wellFormed $ Markup Html [Node (Comment "") [], Node (EndTag "foo") [], Node (EmptyElemTag "foo" []) [Node (Content "bar") []], Node (EmptyElemTag "foo" []) []][EmptyContent,EndTagInTree,LeafWithChildren,BadEmptyElemTag]
isWellFormed :: Markup -> Bool Source #
Are the trees in the markup well-formed?
Warnings
data MarkupWarning Source #
markup-parse generally tries to continue on parse errors, and return what has/can still be parsed, together with any warnings.
Constructors
| BadEmptyElemTag | A tag ending with "/>" that is not an element of |
| SelfCloserWithChildren | A tag ending with "/>" that has children. Cannot happen in the parsing phase. |
| LeafWithChildren | Only a |
| TagMismatch TagName TagName | A CloseTag with a different name to the currently open StartTag. |
| UnmatchedEndTag | An EndTag with no corresponding StartTag. |
| UnclosedTag | An EndTag with corresponding StartTag. |
| EndTagInTree | An EndTag should never appear in |
| EmptyContent | Empty Content, Comment, Decl or Doctype |
| MarkupParser ParserWarning |
Instances
type Result a = These [MarkupWarning] a Source #
The structure of many returning functions.
A common computation pipeline is to take advantage of the These Monad instance eg
markup s bs = bs & (tokenize s >=> gather s) & second (Markup s)
resultError :: Result a -> a Source #
Convert any warnings to an error
>>>resultError $ (tokenize Html) "<foo"*** Exception: MarkupParser (ParserLeftover "<foo") ...
resultEither :: Result a -> Either [MarkupWarning] a Source #
Returns Left on any warnings
>>>resultEither $ (tokenize Html) "<foo><baz"Left [MarkupParser (ParserLeftover "<baz")]
resultMaybe :: Result a -> Maybe a Source #
Returns results if any, ignoring warnings.
>>>resultMaybe $ (tokenize Html) "<foo><baz"Just [StartTag "foo" []]
Token components
type TagName = ByteString Source #
Name of token
name :: Standard -> Parser e ByteString Source #
Parse a tag name. Each standard is slightly different.
selfClosers :: [TagName] Source #
Html tags that self-close
type AttrName = ByteString Source #
Name of an attribute.
type AttrValue = ByteString Source #
Value of an attribute. "" is equivalent to true with respect to boolean attributes.
An attribute of a tag
In parsing, boolean attributes, which are not required to have a value in HTML, will be set a value of "", which is ok. But this will then be rendered.
>>>detokenize Html <$> tokenize_ Html [i|<input checked>|]["<input checked=\"\">"]
Instances
| Generic Attr Source # | |
| Show Attr Source # | |
| NFData Attr Source # | |
Defined in MarkupParse | |
| Eq Attr Source # | |
| Ord Attr Source # | |
| ToExpr Attr Source # | |
Defined in MarkupParse | |
| type Rep Attr Source # | |
Defined in MarkupParse type Rep Attr = D1 ('MetaData "Attr" "MarkupParse" "markup-parse-0.0.0.2-HnQ2lGo9uJcI0zQ43aTniK" 'False) (C1 ('MetaCons "Attr" 'PrefixI 'False) (S1 ('MetaSel ('Nothing :: Maybe Symbol) 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 AttrName) :*: S1 ('MetaSel ('Nothing :: Maybe Symbol) 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 AttrValue))) | |
Tokens
A Markup token
>>>runParser_ (many (token Html)) [i|<foo>content</foo>|][StartTag "foo" [],Content "content",EndTag "foo"]
>>>runParser_ (token Xml) [i|<foo/>|]EmptyElemTag "foo" []
>>>runParser_ (token Html) "<!-- Comment -->"Comment " Comment "
>>>runParser_ (token Xml) [i|<?xml version="1.0" encoding="UTF-8"?>|]Decl "xml version=\"1.0\" encoding=\"UTF-8\""
>>>runParser_ (token Html) "<!DOCTYPE html>"Doctype "DOCTYPE html"
>>>runParser_ (token Xml) "<!DOCTYPE foo [ declarations ]>"Doctype "DOCTYPE foo [ declarations ]"
>>>runParser (token Html) [i|<foo a="a" b="b" c=c check>|]OK (StartTag "foo" [Attr "a" "a",Attr "b" "b",Attr "c" "c",Attr "check" ""]) ""
>>>runParser (token Xml) [i|<foo a="a" b="b" c=c check>|]Fail
Constructors
| StartTag !TagName ![Attr] | A start tag. https://developer.mozilla.org/en-US/docs/Glossary/Tag |
| EmptyElemTag !TagName ![Attr] | An empty element tag. Optional for XML and kind of not allowed in HTML. |
| EndTag !TagName | A closing tag. |
| Content !ByteString | The content between tags. |
| Comment !ByteString | Contents of a comment. |
| Decl !ByteString | Contents of a declaration |
| Doctype !ByteString | Contents of a doctype declaration. |
Instances
tokenize :: Standard -> ByteString -> These [MarkupWarning] [Token] Source #
Parse a bytestring into tokens
>>>tokenize Html [i|<foo>content</foo>|]That [StartTag "foo" [],Content "content",EndTag "foo"]
token :: Standard -> Parser String Token Source #
A flatparse Token parser.
>>>runParser (token Html) "<foo>content</foo>"OK (StartTag "foo" []) "content</foo>"
detokenize :: Standard -> Token -> ByteString Source #
bytestring representation of Token.
>>>detokenize Html (StartTag "foo" [])"<foo>"
gather :: Standard -> [Token] -> These [MarkupWarning] [Tree Token] Source #
Gather together token trees from a token list, placing child elements in nodes and removing EndTags.
>>>gather Html =<< tokenize Html "<foo class=\"bar\">baz</foo>"That [Node {rootLabel = StartTag "foo" [Attr "class" "bar"], subForest = [Node {rootLabel = Content "baz", subForest = []}]}]
degather :: Markup -> These [MarkupWarning] [Token] Source #
Convert a markup into a token list, adding end tags.
>>>degather =<< markup Html "<foo class=\"bar\">baz</foo>"That [StartTag "foo" [Attr "class" "bar"],Content "baz",EndTag "foo"]
XML specific Parsers
xmlVersionInfo :: Parser e ByteString Source #
xml production [24]
xmlEncodingDecl :: Parser e ByteString Source #
xml production [80]
xmlStandalone :: Parser e ByteString Source #
Xml production [32]
xmlVersionNum :: Parser e ByteString Source #
xml production [26]
xmlEncName :: Parser e ByteString Source #
xml production [81]
xmlYesNo :: Parser e ByteString Source #
Xml yes/no