Zack Lynn - Blog - Pages, ePub and PDF...

Sun, 26 Jan 2014 11:31:42 -0800

Pages, ePub and PDF...

Below are some notes from my publisher about the publishing of "Undead Reckoning" to ePub format from my Pages document, passed along for your general amusement... Most of this info will bore the faint-of-heart, and be highly technical, so feel free to skip this article unless you're really (I mean really...) interested...

All work to publish to ePub was done using Pages '09, because the new version of Pages had some issues with ePub format. Some of those have subsequently been fixed in the newly-released Pages 5.1, but not all of them have.

The same source Pages document was used for both the ePub format and the PDF format (for print-on-demand). The Pages document had to be re-formatted for Zack's chosen print "trim" size, which was 5.06" x 7.81", one of the (many) industry standard trim sizes, and one that feels particularly good for reading (bigger than a mass paperback, but not huge like some), and we did lots of formatting, such as adding headers and footers, sections for chapters, etc., to not only make the PDF version look right, but also to make the chapter-marking feature of Pages' ePub export work right. For the print version, Zack chose to use the Baskerville font (the ePub version does not specify the font, because the reader specifies the font).

Pages exports to ePub, and does a pretty good job of it, too. For many publishers, this would probably be sufficient, but there were a few tweaks that we wanted to do that made the raw ePub export from Pages not quite sufficient. In addition, we were using ePub for submission to the Kindle Direct Publishing (KDP) site, and there were some customizations that needed to be done to the ePub to make the ePub-to-MOBI conversion for Kindle work the way we wanted it to. For these reasons, we wrote a custom tool (a perl script) that would post-process the exported ePub, and fix up a few things, exporting both a fixed-up "normal" ePub, and a special version for KDP submission.

One area that needed fix-up was for artwork. For this project we wanted to have the book artwork at the beginning and end of the book, as well as having the full-resolution cover artwork embedded in the ePub document itself. Now Pages has an ePub export option that will take the first page of a book, and convert it to the cover page, while also omitting it from the inline document in the final export. But if you just normally "place" the artwork within the margins of the first page, then Pages will emit the cover having a border equal to the margin, which is very unfortunate. The work-around to this is to "float" the image full-page (outside the margins), however doing this would then cause the inline version of the cover to appear huge; to work-around this, we duplicated the cover artwork page twice: once floated to full (outside the margins), and another to just the margins, and when we exported to ePub, we used the option that uses the first page as the cover artwork, and omits it from export. While this warns during export (because the first page lies outside the margins), it still does the right thing, and you end up with an ePub with cover artwork that is full-size without a border, and the first page of the ePub has an inline cover version. We also had the last page be the back cover artwork as well.

As for the scripted post-process, the script would unpack the .epub file (which is just a zip archive), apply a series of fix-ups, and then re-package (zip back up) the results. The first fix-up was to compensate for a bug in Pages that causes their exported ePubs that have inline images to not pass ePubCheck (the industry standard ePub validator, and a pre-requisite for submission to most distributors), due to wrapping the img tag in a div tag. This we corrected by converting the div tag to a span tag. Since the script also exports a KDP-ready ePub file, that part of the script would fix-up the embedded ISBN number in the copyright page on the fly, to use the ISBN number for the Kindle version (which, for some bizarre reason that probably has more to do with Bowker sales figures than reality, needs to be a separate ISBN from the ePub version...). Additionally, since there is no way in Pages to specify chapter names for chapters, other than in the text itself, for non-chapter navigation points, we used non-breaking spaces as the chapter name (so there would be no visible chapter name text in the PDF version), and used the script to fix-up the navigation points. From a style perspective, printed books do not have spacing after paragraphs, but in eBooks this looks nice (and makes the book more readable), so the script fixes up the CSS for the ePub so that body paragraphs have a non-zero "margin-bottom" property. Also, the meta-data that the Pages export emits is minimal, so the script adds a lot of other meta-data appropriate for the book. Finally, the image files for the inline images that Pages emits are huge (Pages resizes them, for some reason), whereas most distributors have maximum book file sizes, and most readers perform poorly with huge images, so the script substitutes more appropriately-sized image files for the ones that Pages emits.

Now for the printed version, the cover artwork (front, back and spine) is supplied separately from the book contents (in PDF), but our exported PDF from the Pages document had two front cover pages and one back cover page that we did not want appearing on the inside of the book. To that end, I wrote another post-processing script for the PDF file that would remove the first two pages, and the last page, from the PDF; this script was written in python, so I could use Cocoa-Python bindings for PDF document manipulation.

With all of that post-processing, the workflow became:

export to ePub and PDF
run ePub post-processing script, emitting an ePub for general distribution, and another for KDP distribution
run ePubCheck on both, to make sure they are OK
run PDF post-processor
upload to various distributors

Probably a bit over-generalized, but we wanted scripts that could be re-used for other projects...

Zack Lynn