Ticket #1580 (closed proposed-project: fixed)
A high-performance HTML combinator library using Data.Text
| Reported by: | tibbe | Owned by: | |
|---|---|---|---|
| Priority: | good | Keywords: | |
| Cc: | Topic: | Web Development | |
| Difficulty: | unknown | Mentor: | not-accepted |
Description (last modified by tibbe) (diff)
Motivation
Being both fast and safe, Haskell would make a great replacement for e.g. Python and Ruby for server applications. However, good library support for web applications is sorely missing. To write web applications you need at least three components: a web application server, a data storage layer, and an HTML generation library. The goal of this project is to address the last of the three, as the two are already getting some attention from other Haskell developers.
Introduction
Almost all web applications need to generate HTML for rendering in the user's browser. The three perhaps most important properties in an HTML generation library are:
- High performance: Given that the network introduces a lot of latency the server is left with very little time to create a response to send back to the client. Every millisecond not spent on generating HTML can be used to process the user's request. Furthermore, efficient use of the server's resources is important to keep the number of clients per server high and costs per client low.
- Correctness: Incorrectly created HTML can result in anything from incorrect rendering (in the best case) to XSS attacks (in the worst case).
- Composability: Being able to create small widgets and reuse them on several pages fosters consistency in the generated output and helps both correctness and reuse. (Being able to treat HTML fragments as values rather than as strings is important.)
Combinator libraries, like the html package on Hackage, address the the last two criteria by making the generated HTML correct by construction and making HTML fragments first class values. Traditional templating systems generally have the first property, offering excellent performance, but lacks the other two.
Project Goals
Create a new HTML combinator library, based on the html package, that's blazingly fast, well tested and well documented. Also improve upon the html package's API by e.g. splitting the attribute related functions into their own module. The new library ought to use Data.Text instead of String as the base type for text.
Non-Goals
- Using the very latest type system features to ensure well-formedness. This is not a research project; The goal is to create a production quality library using know techniques.
- Integrating with formlets.
Tasks
- Write an initial implementation based on the API in Text.Html (skip Text.Html.BlockTable) but with a more predictable naming scheme for combinators. For example, all combinators should have the same name as their HTML counterparts except when they collide with reserved words in which case an "_" is appended.
Move the attribute combinators into Text.Html.Attribute and use namespaces instead of function prefixes to resolve name collisions.
- Write a test suite and a benchmark suite.
- With the support of the test and benchmark suite, optimize the performance of the library. Perhaps using tools like ghc-core to study the generated code.
- Document the API extensively, using Haddock. Include usage examples at the top of the module.
References
- Unicode and HTML article on Wikipedia
- The HTML 5 FAQ lists a number of interesting differences between HTML and XHTML.
Tools
- QuickCheck for testing,
- Criterion for benchmarking, and
- Haddock for documenting.
Interested Mentors
- Johan Tibell
FAQ
- Q: Why not use a templating language?
- A: While popular, templating languages suffer from a number of problems:
- Lack of abstraction features to address code and HTML duplication. Templating languages tend to become (poor) general purpose languages over time.
- Security issues related to leaving proper escaping to the user.
- Well-formedness issues due to representing HTML as string fragments rather than first class values.
- Reuse issues due to not treating HTML as first class values (i.e. reuse is achieved by textual inclusion rather than composition).
