pdftotext-0.1.0.1: Extracts text from PDF using poppler
Copyright(c) 2020 G. Eyaeb
LicenseBSD-3-Clause
Maintainergeyaeb@protonmail.com
Stabilityexperimental
PortabilityPOSIX
Safe HaskellNone
LanguageHaskell2010

Pdftotext.Internal

Description

Internal functions.

Synopsis

Types

data Layout Source #

Layout of text extracted from PDF.

Constructors

Physical

Text emulates layout of PDF, including horizontal spaces, and preserves hyphenation; corresponds to calling pdftotext -layout

Raw

Discards horizontal spaces, preserves hyphenation; corresponds to calling pdftotext -raw

None

Discards horizontal spaces, removes hyphenation; corresponds to calling pdftotext without layout argument

Instances

Instances details
Eq Layout Source # 
Instance details

Defined in Pdftotext.Internal

Methods

(==) :: Layout -> Layout -> Bool #

(/=) :: Layout -> Layout -> Bool #

Show Layout Source # 
Instance details

Defined in Pdftotext.Internal

data Page Source #

Constructors

Page 

Fields

Instances

Instances details
Show Page Source # 
Instance details

Defined in Pdftotext.Internal

Methods

showsPrec :: Int -> Page -> ShowS #

show :: Page -> String #

showList :: [Page] -> ShowS #

data Properties Source #

Document properties.

Since: 0.0.2.0

Instances

Instances details
Show Properties Source # 
Instance details

Defined in Pdftotext.Internal

Generic Properties Source # 
Instance details

Defined in Pdftotext.Internal

Associated Types

type Rep Properties :: Type -> Type #

type Rep Properties Source # 
Instance details

Defined in Pdftotext.Internal

Loading PDF's

openByteStringIO :: ByteString -> IO (Maybe Document) Source #

Open PDF represented as bytestring. If document cannot be parsed as valid PDF, Nothing is returned.

openFile :: FilePath -> IO (Maybe Document) Source #

Open PDF from file. If file does not exist or cannot be parsed as valid PDF, Nothing is returned.

Document functions

pageIO :: Int -> Document -> IO (Maybe Page) Source #

Return page number no from PDF document, if the page exists.

pagesIO :: Document -> IO [Page] Source #

Return all pages from document.

pagesTotalIO :: Document -> IO Int Source #

Return number of pages contained in document.

pdftotextIO :: Layout -> Document -> IO Text Source #

Extract text from PDF document with given Layout.

propertiesIO :: Document -> IO Properties Source #

Extract properties from the document.

Since: 0.0.2.0

Page functions

pageTextIO :: Layout -> Page -> IO Text Source #

Extract text from a page with given Layout.