Pandoc, Markdown, XeLaTeX, EPUB

Updated 20-Sep-2023

EPUB documents are essentially a kind of html document as a collection of files which are zipped, and include html, css, images, and some XML pages. There are several ways of organizing these, but the most straightforward is one html document for each chapter (or section), a set of images organized in a subfolder, and a few metadata files regarding the collection. An epub document can be even simpler, and consist of a single html file, no images, and a few metadata files.

Related Articles

The easiest way to do anything is to first get organized, and then choose the best tools for the job (which happen to be the best tools for the craftsman). In this case, for actual writing and editing, for formatting and layout, and for examining the work, there are a set of tools.

Install TeX, XeLaTeX, Pandoc, Calibre, VSCode

These tools provide the editing (VSCode), typesetting and file generation (TeX, XeLaTeX, Pandoc), and auditing (Calibre E-book Editor, and a PDF viewer).

Install Pandoc

Pandoc is ancient on the Debian distribution (and others), best to install from Pandoc on Github, or if needed the Unofficial Nightly Builds.

Install TeX

This is a long process, since there are many gigs to download. Provided one wants everything (when/if needed), proceed as follows.

  • Download the install file
  • Unzip install file
  • Navigate to install directory and run ./tl-install
  • Select options and begin

Note to ensure XeLaTeX is installed (it allows for multiple fonts and South Asian Script support), and also to be aware how to do upgrades to TeX.

Make sure the export path is available for TeX.

Install VSCode

Download VSCode from Microsoft.

Elements of a Publication

There are several elements of a publication which recur from one publication to another. It is best to get organized.

  • Metadata: Title, Subtitle, BISAC code(s), pages, date of publication, date of revision, Author, etc.
    • There are two files for this in ebook generation: title.txt and metadata.xml
  • Cover image (this will be in several different sizes depending where it is used)
  • Title Page, Copyright, Dedication, Preface, Introduction, Content (Chapters), Acknowledgments, Glossary, Index
    • Note that the above sections can be one document, or several documents
  • Any fonts specifically used/embedded /fonts/
  • Additional images (figures, tables) /images/
  • Stylesheet stylesheet.css
  • Table of contents (should be generated)
    • Note that there are two tables of contents, one automatically created via the reader, and another in html (helpful for pdf files as well).
  • For a print edition, a full cover is needed, and one will want ISBN barcodes and perhaps a qrcode as well.

Edit, Transform, Publish

Editing all html by hand can be tedious, and certain markup can best be managed with markup tools such as Pandoc-flavored Markdown and XeLaTeX, and text transformation tools such as Pandoc. If no extended external font support is needed, markdown alone and some YAML/XML files are all that is needed.

  • Note that with XeLaTeX, one can do some extensive formatting, such as ShareLaTeX's nifty sample templates.
  • For more about Markdown, but sticking with Markdown Extra / Markdown Pandoc, which are very similar. With Markdown Pandoc latex commands are passed through (e.g., when compiling to pdf), or ignored (e.g., when compiling to ebook).

Templates in the conversion process

Templates are important for control and also for formatting (though much could be managed with CSS for ebook and also classes for latex). Epub and PDF (for print, as well as e-reading) are different from each other in many structural ways. Hewing to their similarities helps reduce the complexity. In particular, the use of scrbook as a latex template, and the general organization of *book document classes should be a primary guide (e.g., \frontmatter, \mainmatter, etc.).