# readability: Extracts text of main article from HTML document

[ bsd3, html, library, program, text ] [ Propose Tags ]

Give readability an HTML document and it will detect and extract text of the article while removing everything unnecessary like menus, advertisements or sidebars. It is more or less reimplementation of python-readability.

#### Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees

Candidates

[back to package description]

Give readability an HTML document and it will detect and extract text of the article while removing everything unnecessary like menus, advertisements or sidebars. It is more or less reimplementation of python-readability.

The package contains both a library and simple executable.

## Example of using readability executable

Having an article that looks like following image:

we can extract text by calling:

$> readability https://mises.org/wire/why-central-banks-are-threat-our-savings  and we get the following HTML: If we are interested in plain text, we can further use pandoc: $> readability https://mises.org/wire/why-central-banks-are-threat-our-savings | pandoc -f html -t plain

The US personal savings rate jumped to 33 percent in April from 12.7
percent in March and 8 percent in April last year. An increase in
savings is regarded by popular economics as less expenditure on
consumption. Since consumption expenditure is considered as the main
driving force of the economy, obviously a rebound in savings, which
implies less consumption, cannot be good for economic activity, so it is
held. Saving and wealth—what is the relation?

consumer goods. An increase in various consumer goods permits an
increase in individuals’ living standards. What allows an increase in
the production of consumer goods is the maintenance and the enhancement
of the infrastructure of an economy. With better infrastructure, a
greater quantity and better quality of consumer goods could be generated
and more real wealth can be produced.

The enhancement and the maintenance of the infrastructure becomes
possible because of the availability of final consumer goods that
sustain the various individuals who are busy expanding and maintaining
the infrastructure. It is the producers of final consumer goods who pay
the various individuals engaged in maintenaning and enhancing the
infrastructure. The producers of final consumer goods pay these
individuals (i.e., the intermediary producers) out of the saved or
unconsumed production of final consumer goods.

Note that when a producer of final consumer goods decides to save more,
i.e., to consume less, the fall in his consumption is offset by the
increase in the consumption of individuals who are engaged in the
intermediary stages of production. This means that overall consumption
is not declining because of an increase in saving—as popular thinking
has it.


Had we not processed the article through readability, we would have gotten:

$> curl https://mises.org/wire/why-central-banks-are-threat-our-savings | pandoc -f html -t plain Skip to main content [Home] Toggle navigation - Blog - Mises Wire - Books - Podcast - Video - Events - Store - Graduate Program - Ver en Español Stay Connected GO SUPPORT MISES JOIN OR RENEW TODAY SUPPORT MISES JOIN OR RENEW TODAY Mises Wire GET NEWS AND ARTICLES IN YOUR INBOXPrint A A Home | Wire | Why Central Banks Are a Threat to Our Savings Why Central Banks Are a Threat to Our Savings - [dollars] 0 Views Tags Money and Banking 06/25/2020Frank Shostak The US personal savings rate jumped to 33 percent in April from 12.7 percent in March and 8 percent in April last year. An increase in savings is regarded by popular economics as less expenditure on consumption. Since consumption expenditure is considered as the main driving force of the economy, obviously a rebound in savings, which implies less consumption, cannot be good for economic activity, so it is held. Saving and wealth—what is the relation?  We can also print only title or short title of the article: $> readability --extract shortTitle https://mises.org/wire/why-central-banks-are-threat-our-savings
Why Central Banks Are a Threat to Our Savings


Or we can print all available information as JSON for further processing:

$> readability --extract all https://mises.org/wire/why-central-banks-are-threat-our-savings | jq '.' { "article": "…", "shortTitle": "Why Central Banks Are a Threat to Our Savings", "title": "Why Central Banks Are a Threat to Our Savings | Mises Wire" }  Raw HTML can be also provided using standard input when SOURCE is omitted: $> curl -s https://mises.org/wire/why-central-banks-are-threat-our-savings | readability -e title
Why Central Banks Are a Threat to Our Savings | Mises Wire


## Contribute

Project is hosted at https://sr.ht/~geyaeb/haskell-readability/ . The homepage provides links to Mercurial repository, mailing list and ticket tracker.

Patches, suggestions, questions and general discussions can be send to the mailing list. Detailed information about sending patches by email can be found at https://man.sr.ht/hg.sr.ht/email.md.