readability: Extracts text of main article from HTML document

[ bsd3, html, library, program, text ] [ Propose Tags ]

Give readability an HTML document and it will detect and extract text of the article while removing everything unnecessary like menus, advertisements or sidebars. It is more or less reimplementation of python-readability.


[Skip to Readme]
Versions [faq] 0.0.1.0, 0.1.0.0
Change log CHANGELOG.md
Dependencies aeson (>=1.4 && <1.6), base (>=4.7 && <5), bytestring (==0.10.*), containers (==0.6.*), html-conduit (==1.3.*), http-conduit (==2.3.*), optparse-applicative (==0.15.*), readability, text (==1.2.*), xml-conduit (>=1.7 && <2) [details]
License BSD-3-Clause
Copyright 2020 G. Eyaeb
Author G. Eyaeb
Maintainer geyaeb@protonmail.com
Category Text, HTML
Home page https://sr.ht/~geyaeb/haskell-readability
Bug tracker https://todo.sr.ht/~geyaeb/haskell-readability
Source repo head: hg clone https://hg.sr.ht/~geyaeb/haskell-readability
Uploaded by geyaeb at 2020-07-09T11:35:38Z
Distributions NixOS:0.1.0.0
Executables readability
Downloads 121 total (4 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Hackage Matrix CI
Docs available [build log]
Last success reported on 2020-07-09 [all 1 reports]

Modules

[Index] [Quick Jump]

Downloads

Maintainer's Corner

For package maintainers and hackage trustees


Readme for readability-0.1.0.0

[back to package description]

readability

Give readability an HTML document and it will detect and extract text of the article while removing everything unnecessary like menus, advertisements or sidebars. It is more or less reimplementation of python-readability.

The package contains both a library and simple executable.

Example of using readability executable

Having an article that looks like following image:

Original HTML

we can extract text by calling:

$> readability https://mises.org/wire/why-central-banks-are-threat-our-savings

and we get the following HTML:

Extracted text

If we are interested in plain text, we can further use pandoc:

$> readability https://mises.org/wire/why-central-banks-are-threat-our-savings | pandoc -f html -t plain

The US personal savings rate jumped to 33 percent in April from 12.7
percent in March and 8 percent in April last year. An increase in
savings is regarded by popular economics as less expenditure on
consumption. Since consumption expenditure is considered as the main
driving force of the economy, obviously a rebound in savings, which
implies less consumption, cannot be good for economic activity, so it is
held. Saving and wealth—what is the relation?

To maintain their life and well-being, individuals require access to
consumer goods. An increase in various consumer goods permits an
increase in individuals’ living standards. What allows an increase in
the production of consumer goods is the maintenance and the enhancement
of the infrastructure of an economy. With better infrastructure, a
greater quantity and better quality of consumer goods could be generated
and more real wealth can be produced.

The enhancement and the maintenance of the infrastructure becomes
possible because of the availability of final consumer goods that
sustain the various individuals who are busy expanding and maintaining
the infrastructure. It is the producers of final consumer goods who pay
the various individuals engaged in maintenaning and enhancing the
infrastructure. The producers of final consumer goods pay these
individuals (i.e., the intermediary producers) out of the saved or
unconsumed production of final consumer goods.

Note that when a producer of final consumer goods decides to save more,
i.e., to consume less, the fall in his consumption is offset by the
increase in the consumption of individuals who are engaged in the
intermediary stages of production. This means that overall consumption
is not declining because of an increase in saving—as popular thinking
has it.

Had we not processed the article through readability, we would have gotten:

$> curl https://mises.org/wire/why-central-banks-are-threat-our-savings | pandoc -f html -t plain

Skip to main content

[Home]

Toggle navigation

-   Blog
-   Mises Wire
-   Books
-   Podcast
-   Video
-   Events
-   Store
-   Graduate Program

-   Ver en Español

Stay Connected

GO

SUPPORT MISES

JOIN OR RENEW TODAY

SUPPORT MISES

JOIN OR RENEW TODAY

Mises Wire

GET NEWS AND ARTICLES IN YOUR INBOXPrint

A

A

Home | Wire | Why Central Banks Are a Threat to Our Savings

Why Central Banks Are a Threat to Our Savings

-   [dollars]

0 Views

Tags

Money and Banking

06/25/2020Frank Shostak

The US personal savings rate jumped to 33 percent in April from 12.7
percent in March and 8 percent in April last year. An increase in
savings is regarded by popular economics as less expenditure on
consumption. Since consumption expenditure is considered as the main
driving force of the economy, obviously a rebound in savings, which
implies less consumption, cannot be good for economic activity, so it is
held. Saving and wealth—what is the relation?

We can also print only title or short title of the article:

$> readability --extract shortTitle https://mises.org/wire/why-central-banks-are-threat-our-savings
Why Central Banks Are a Threat to Our Savings

Or we can print all available information as JSON for further processing:

$> readability --extract all https://mises.org/wire/why-central-banks-are-threat-our-savings | jq '.'
{
  "article": "…",
  "shortTitle": "Why Central Banks Are a Threat to Our Savings",
  "title": "Why Central Banks Are a Threat to Our Savings | Mises Wire"
}

Raw HTML can be also provided using standard input when SOURCE is omitted:

$> curl -s https://mises.org/wire/why-central-banks-are-threat-our-savings | readability -e title
Why Central Banks Are a Threat to Our Savings | Mises Wire

Contribute

Project is hosted at https://sr.ht/~geyaeb/haskell-readability/ . The homepage provides links to Mercurial repository, mailing list and ticket tracker.

Patches, suggestions, questions and general discussions can be send to the mailing list. Detailed information about sending patches by email can be found at https://man.sr.ht/hg.sr.ht/email.md.