| Copyright | (c) 2020 G. Eyaeb |
|---|---|
| License | BSD-3-Clause |
| Maintainer | geyaeb@protonmail.com |
| Stability | experimental |
| Portability | POSIX |
| Safe Haskell | None |
| Language | Haskell2010 |
Pdftotext.Internal
Description
Internal functions.
Synopsis
- newtype Document = Document (ForeignPtr Poppler_Document)
- data Layout
- data Page = Page {}
- data Properties = Properties {}
- openByteStringIO :: ByteString -> IO (Maybe Document)
- openFile :: FilePath -> IO (Maybe Document)
- pageIO :: Int -> Document -> IO (Maybe Page)
- pagesIO :: Document -> IO [Page]
- pagesTotalIO :: Document -> IO Int
- pdftotextIO :: Layout -> Document -> IO Text
- propertiesIO :: Document -> IO Properties
- pageTextIO :: Layout -> Page -> IO Text
Types
Constructors
| Document (ForeignPtr Poppler_Document) |
Layout of text extracted from PDF.
Constructors
| Physical | Text emulates layout of PDF, including horizontal spaces,
and preserves hyphenation; corresponds to calling |
| Raw | Discards horizontal spaces, preserves hyphenation;
corresponds to calling |
| None | Discards horizontal spaces, removes hyphenation;
corresponds to calling |
Constructors
| Page | |
Fields
| |
data Properties Source #
Document properties.
Since: 0.0.2.0
Constructors
| Properties | |
Instances
Loading PDF's
openByteStringIO :: ByteString -> IO (Maybe Document) Source #
Open PDF represented as bytestring. If document cannot be parsed as valid PDF,
Nothing is returned.
openFile :: FilePath -> IO (Maybe Document) Source #
Open PDF from file. If file does not exist or cannot be parsed as valid PDF,
Nothing is returned.
Document functions
pageIO :: Int -> Document -> IO (Maybe Page) Source #
Return page number no from PDF document, if the page exists.
pdftotextIO :: Layout -> Document -> IO Text Source #
Extract text from PDF document with given Layout.
propertiesIO :: Document -> IO Properties Source #
Extract properties from the document.
Since: 0.0.2.0