This is a parser for HTML documents. Unlike for XML documents, it must include a certain amount of error-correction to account for HTML features like self-terminating tags, unterminated tags, and incorrect nesting. The input is tokenised by the XML lexer (a separate lexer is not required for HTML).
The first argument is the name of the file, the second is the string contents of the file. The result is the generic representation of an XML document. Any errors cause program failure with message to stderr.