epub-metadata: Library and utility for parsing and manipulating ePub OPF package data

[ codec, library, program, text ] [ Propose Tags ]

Library and utility for parsing and manipulating ePub OPF package data. An attempt has been made here to very thoroughly implement the OPF Package Document specification. Also included is a command-line utility to dump OPF package data to stdout in a human-readable form.


[Skip to Readme]

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees

Candidates

  • No Candidates
Versions [RSS] 1.0.2, 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.2.0.0, 2.2.0.1, 2.3.0, 2.3.1, 2.3.2, 3.0, 4.0, 4.1, 4.2, 4.3, 4.4, 4.5, 5.0, 5.1, 5.2, 5.3, 5.4
Dependencies base (>=3 && <5), containers, hxt (>=9), LibZip, mtl, regex-compat [details]
License BSD-3-Clause
Copyright 2010, 2011 Dino Morelli
Author Dino Morelli
Maintainer Dino Morelli <dino@ui3.info>
Category Codec, Text
Home page http://ui3.info/d/proj/epub-metadata.html
Uploaded by DinoMorelli at 2011-01-10T19:40:11Z
Distributions LTSHaskell:5.2, NixOS:5.2, Stackage:5.4
Reverse Dependencies 1 direct, 0 indirect [details]
Executables epubmeta
Downloads 13241 total (59 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs uploaded by user
Build status unknown [no reports yet]

Readme for epub-metadata-2.0.2

[back to package description]
-----
Building:

   Easy with cabal-install, of course:

      $ cabal install epub-metadata

   Or the conventional way:

      $ runhaskell Setup.hs configure
      $ runhaskell Setup.hs build
      $ runhaskell Setup.hs test
      $ runhaskell Setup.hs haddock
      $ runhaskell Setup.hs install


-----
Why was this done?

   The motivation for this project grew out of my desire to take charge
   of missing or incorrect ePub metadata in books I have purchased. I
   started out using the Calibre open source tools for examining this
   info. Limitations and incomplete implementation of those tools led
   me here to build a more complete implementation in the programming
   language that I love beyond all others.


-----
Why didn't I just use existing solutions?

   - Calibre ebook-meta utility

      I experienced various problems using this software, such as:

      Incomplete and in some cases incorrect handling of tags that can
      exist more than once, particularly when they are differentiated
      using attributes according to the spec.

      Unable to display many fields in the OPF Package Document metadata
      specification. Unable to manipulate data that is represented as
      attributes of tags in the OPF spec.

      Astonishingly slow performance. The command-line tool in this
      new Haskell project is more than 45 times faster at parsing
      and displaying ePub metadata. I'm going to blame Python here for
      Calibre's performance. This has had a big impact on projects where
      I've been processing hundreds of ePubs in batch operations.

      To be fair, an effort is being made in Calibre to work with both
      ePub and Sony LRF book documents. That is going to naturally require
      a lowest-common-denominator approach. My focus here was to work
      with ePub only, and thoroughly support the OPF specification.


   - epub on Hackage, EPUB E-Book construction support library

      The focus of this project seems to be with building new documents,
      not parsing existing files. And there is a specific attempt to
      do more than the metadata, to gather up the content and other
      metafiles that make up an ePub for creation.

      Examining Codec.Ebook.OPF.Types, most of the metadata fields
      from the OPF Package Document spec are missing or aren't modeled
      thoroughly. I felt to contribute to this project, I would have
      had to significantly rip up the types and redesign them.

      At this time I felt it was a better solution for me to start fresh
      with modelling these types and code to manipulate them. That said,
      I would be very interested in combining the epub and epub-metadata
      projects at some point in some way that makes sense.


-----
A word about the version numbering scheme:

   4-part: major.minor.status.build
   3-part: major.status.build

   status:
      0 alpha
      1 beta
      2 release candidate
      3 release

   examples:
      1.3.0.2         v1.3 alpha build 2
      1.2.1.0         v1.2 beta build 0
      4.2.24          v4 release candidate build 24
      2.10.3.5        v2.10 release build 5 (say they were bug fixes)
      1.5.2.20090818  Can even use a date for build
                      v1.5 release candidate 2009-08-18 build