Posted on Leave a comment

The Problem with H1

H1 and Content Boundaries on the Web and EBook Publications

There is a problem embedded in epub, which is that it is normally composed of several html documents, one per chapter. However, for parsers to create epubs properly (such as Pandoc), they do it based on H1, so that each H1 signifies the beginning (and title) of a new html document (which is a chapter).

However, as we know an HTML document itself (especially regarding the web) should only have one H1. Therefore if the native (single) document being edited is itself a book, then the single document will have multiple H1s embedded within it.

This means there is a basic disconnect of a book being an ODT file or HTML file or even a PDF vs. being an Ebook.

Baker & Taylor require epubs to have only one H1, which is itself the title of the work, and everything else H2 (e.g., chapter headers). However, the spec and common use has an H1 for each chapter.


The good news here is that once a piece of software has parsed the document and has a copy in memory, it's relatively trivial to change all the H1s to H2s, H2s to H3s (and back), so this won't be a problem to deal with, and in UX Write it won't be something the user has to do manually.

Using a single H1 for the book title only is stupid IMHO. That's what the title tag in HTML is for, or you can define a custom CSS class called "title". H1 as far as I'm concerned should be for chapters.

But again, this problem is easy to get around in software, and I could perhaps provide some options for EPUB export where you can specify how you want to handle things like this.

Pushing the problem from H1 to Meta Title gives the same problem: An epub ebook has multiple html documents (one per chapter). On the web, there should be one and only one H1 (for Google's purposes, and possibly in the HTML spec). The Meta Title is not (necessarily) displayed to the user (though browsers traditionally put it into the browser title bar as well), whereas the H1 does get displayed to the user, so these definitely have two different uses in terms of user/display.

The problem I see comes when people what to explicitly tag H1, H2, etc., and your application decides if/when it will do overrides. This is (partially) what I mean by having semantics (markup via markdown/copymarkup) be primary. By not having this and dealing with HTML export/native file format, you put all the control into your application, but take it away from the user and the documents.

Further, I believe that documents themselves should not be the top level, but collections of documents (libraries). This is what Scrivener allows for, where you can define which part of a document collection tree is the top level for an export/compilation.

What this allows for is people to have a single editor instance and navigate across multiple documents, books and book elements. This is very fast when wanting to bounce around between various documents. Granted it can slow down on load and save if the entire structure is written out, but that is not much of a problem in Scrivener.

This helps further define things such that a book is not a single document, but a collection of documents (the idea of "book" being a container). Not only does this work with epub thinking but also website thinking. A website is a collection of documents, but not a document itself (it is an address). Also, this idea helps out that each web page itself has an address, as well as meta title and h1. In essence this means that a web page is a chapter in a web site (book).

This means that epubs and websites are on the same page, as it were, whereas the pdf and odt is (or rather, can be) at odds insofar as a single instance can be a collection of chapters (and accompanying images, including usually a cover image).

Yes I've been having a look at the EPUB spec and realised this is something I'll have to deal with, particularly if I have support for opening existing EPUB files and editing them (as opposed to just exporting).

My understanding was that H1 was intended for top-level sections of a document, where "top-level section" can have different meanings depending on the type of document. For example in a book this would be chapter (or for a really large book, possibly even part), and for a smaller article this would be section. In LaTeX for example there is the book class and the article class; in the former you have \part{...} and \chapter{...} commands, and \section{...} and \subsection{...}. The article class only has the latter two.

So it becomes a matter of mapping what is in the file to the different levels of headings, and this could be different for different file formats (or variations thereof). For example when importing a LaTeX file, and it would look at what the document class is to determine whether H1 is chapter or part, and or whether H1 should be section instead.

Fortunately there are two things which work in our favour regarding this:

  1. The H* tags are semantic only; their appearance can be customised totally. So you could quite reasonably use H2 as the tag for chapters, but have the style name as displayed in the UI as "Chapter", with H3 displayed as "Section" and H1 as "Title". This way the user wouldn't have to think in terms of the HTML tag names but rather their meaning. Also the outline navigator could be configured to start at H2 instead of H1 in this case. So there's potential providing flexibility here as to how the different levels are presented in the UI vs. what specific tag names are used in the file.

  2. As I mentioned before, it's possible to "push up" or "pull down" the different heading tags by renumbering on load/save, if necessary. This is something which could be provided as an option to the user, or perhaps configured in a template/profile where you set up how you want your epub file structured (including single file vs. multiple files), depending on publisher's requirements.

To date, UX Write has been designed around the concept of working with everything within a single document. It's certainly capable of doing this in terms of performance, but with EPUB this throws a bit of a spanner in the works because some EPUB books use separate files. Also I've had a few people raise the desire to view individual sections by themselves, similar to what you have in Scrivener. So it might be worthwhile expanding this to allow you to work with multiple files that are all in the one "package", much like Scrivener does. For some types of writing you just want a single document, but for others (esp. books) you want multiple documents packaged together. So it would be good to support both approaches.

I'd be very keen to get your input on the UI for this, what sort of options should be provided, and how we could come up with something that gives the right level of control to authors about how their document is structured. And perhaps with the market research work we've discussed part of what you could do is present some UI mockups to various authors and get their input. What do you think?

Yes, I am interested in doing work in this area. It is a problem area I am working out myself as a "lead user".

For me the key is to work out problems in the areas mentioned that will not cause problems in other areas, or indeed that can help lead to solutions in those areas.

For example: self-publishers also have needs to have a website. Some of their content (usually a few chapters) are hosted on their website. Exporting the document to support markdown and/or html is fairly straightforward if the epub is taken as the model (one html page per chapter). Also, conceptually, the idea that a website is a book and a web page is a chapter does make some sense (though some web page "articles" are multi-page, that is usually for advertising revenue or attention-tracking or SEO rather than readability/usability).

Also, the ability to edit on the website and have those changes pushed back into the source documents (or some kind of synchronization) would be helpful (though not essential).

Another issue: when using Google Docs for collaboration with writers and editors, we always created one document per chapter for editing purposes. So this makes sense as well.

As far as I know, most content management for documents are collections of files, in some kind of nested folder structure. Most are too much trouble to deal with, other than making files available in the cloud for backup and sharing (Dropbox, etc.).

The Calibre ebook library software, which is tedious to use (essentially a searchable but single list as a collection, but using tags and authors for grouping purposes, kind of like another itunes) does have the advantage of being able to import multiple versions, create multiple versions, and transfer ebooks between a desktop and an ebook reader device.

Posted on Leave a comment

Telegram for Social Networking

Telegram is a great chat app, but there is more, and less to it, than say Twitter and Facebook. The first thing is that a lot of this gamification of likes/thumbsup is gone. Want to know if someone read your post? That has to be done either via direct message, or in a group (and the person has to respond). Recently there are new apis that help enable discussions on posts, as well as connecting channel posts as annoucements in groups.

Types of Accounts in Telegram

There is a single namespace in telegram for all entities: users, channels, groups, and bots. Users are individual accounts tied to a phone number (I think that is mandatory). Telegram Channels are one-way broadcast accounts, which can have multiple admins (but messages are signed by the channel. Membership in channels is unlimited. Telegram Groups can include up to 200,000 users, and everyone can post.

Using Bots for Commenting and Discussion

Note that for feedback on channel posts one can add a like bot or other such simple feedback, or add a discussion group and put that information in the channel description. A third new option is to have a comment system using an app which would also be available on the web as a preview (without logging into Telegram). The preview bot that does this works nicely and shows off what kind of api/developer support Telegram.

No Manipulation or Advertising

Instead of the constant intrusion of 99% annoyance in terms of timeline distortion and advertising as found in Facebook and Instagram (and to some extent Twitter, which is going down that same path).

Essentially, the use of channels with comments can replace any given social network (other limitations apply), such as Twitter, Facebook, and Instagram. While those platforms still have the lion's share of engagement and users, moving over to the Telegram way of things makes sense. for Longform is a longform microblog platform which is very simple and also has zero advertising. There is a nice Telegraph App in the Google Play store.

Installing Telegram

For the Linux and ChromeOS world, the options are: Telegram Desktop (for Linux) and Telegram Android App (for ChromeOS).

Posted on

Podcast Platforms

Podcasting is growing (slowly) and offers a great opportunity for brand engagement. Generally free, the idea is to be where the audience already is, and have a reliable host for content and the rss feed.

Media and RSS Hosting

Google Podcasts and Google Play Music Podcasts

Note, these are two different things: First Thing - Google Podcast (part of Google Search) - Google Podcast Publisher Tools - Google Podcasts App Second Thing - Google Play Music Podcasts

Pocket Casts (#4 platform

Stitcher (#3 platform)

Spotify (#2 platform)

iTunes/Apple Music (#1 platform)

WordPress Plugins

Posted on

Epub Editing Tools

Tools change over time, but it seems that in the Epub world we have more of the same. As of November 2018: - Calibre's Epub Editor is pretty nifty - Sigil development stalled, then picked up again - Pagina Epub Checker is still under development and useful - Pandoc with or without some kind of TeX, LaTeX, or XeLaTeX -- the last one is better for font support Things haven't really changed over the past X years, much. Certainly not since the 2017 note on Epub tools.

Some Pandoc Resources

Posted on

Kindle Paperwhite 4th Gen

I've used a Kindle since the Kindle Keyboard (3rd gen), and since then purchased and used the DX for a while (the much larger model). On 06 September 2012 the Kindle Paperwhite was released and I registered mine on 10 September. I broke that model within six months by wedging it in a bag that had too many objects in it, but Amazon sent out a replacement free-of-charge (which included free shipping, and I live outside the United States). Well folks, the first generation Paperwhite has served me well, and I did not feel a need for an upgrade, at the prices that were available for fancy versions like the Voyage and Oasis, or non-Kindle devices such as what Kobo offers. However, at this point, on the eve of the release of the fourth generation of the paperwhite, that has changed, and I intend to upgrade.

Specifications of First and Fourth Generation Paperwhite

Generation Dimensions Weight Lighting Screen Storage Bluetooth Audible Waterproof
First Gen 117 x 169 x 9.1 mm 213 grams 4 led 212 ppi 2gb no no
Fourth Gen 116 x 167 x 8.2 mm 182 grams 5 led 300 ppi 8/32gb yes IPX8

Reasons to Upgrade

At 12% smaller (mainly due to thinness) and 15% lighter, less is more, and this is a significant motivator to upgrade. Storage is not an issue for me, and 8gb will be fine. The increased quality of the lighting 5 vs. 4 led) and screen resolution (300ppi vs. 212ppi) are nice, but not essential. Bluetooth audible is ok. I don't use audible now but might later. I certainly would not upgrade for that feature. The waterproof quality, combined with dimensions/weight and screen, is what puts this over the edge in terms of a desire to upgrade.

Open Source, Open Content

While I do use a Kindle, most of my content I have in PDF and Epub formats. PDF is not very readable on the Kindle and I rarely do it. However, Epubs are easy to convert using Calibre, an open source, cross platform library and ebook management tool. The DeDRM toolkit is very useful for stripping out the nasty DRM that comes with Kindle ebooks. I prefer unlocked files as my main library repository. Also, many ebooks are available at a variety of locations including Library Genesis, a resource of unparalleled breadth and depth. I prefer to use the Kindle device due to its quality hardware, and ease of access of their ebook offerings (I do regularly purchase content from Amazon). The DRM they use I simply work-around/ignore. In the past I've rooted both the Kindle Keyboard (3rd Gen), Kindle DX, and Kindle Paperwhite, though my current version is using stock Kindle software on the device. I'm not irrevocably mated to Kindle and Amazon, but it is my current preferred platform.

Posted on

Dokuwiki – The Canonical Wiki

Dokuwiki, over the last 10 years, has become the canonical wiki. By this I mean that Dokuwiki is the go-to wiki for most uses. While there are many other wikis which are popular and in use (e.g., Xwiki, MoinMoin, TikiWiki, etc.), the competitors (other than Mediawiki) do not exceed half of Dokuwiki's popularity. The only real competitor in terms of global mindshare is Mediawiki, and the only reason for that is of course Wikipedia and the other wiki properties run by the Wikimedia foundation. Since Mediawiki is pretty much a shit show when it comes to management and resource consumption, Dokuwiki is the winner by default. Even with such a behemouth as a competitor, Dokuwiki has reached the point where it has more than half the generic searches in Google worldwide compared with Mediawiki. That said, overall attention on wiki software as a category has declined over time, perhaps by half in the past 5 years for Dokuwiki (and much more for Mediawiki). The wiki as a communication tool has many competitors these days, especially in terms of enterprise and cloud-based groupware. That said, the main reasons for the ongoing success of Dokuwiki, I believe are threefold: - Ongoing, consistent, quality, incremental updates; - Community-friendly architecture for plugins and themes; and - Minimalist resource requirements that includes a flat file-only data store option (as standard).

Wiki vs. Blog

My own emerging use case is something that I tried to do years ago with Mediawiki, but because of the nature of Mediawiki (impoverished community and technical incompetence), it ended in tears. That is, as of now, I intend to replace websites which have been maintained on a multisite WordPress (+ Woocommerce in some cases). A Dokuwiki-based wiki farm along with a third party ecommerce service (Gumroad) should make things simpler, easier to maintain and extend, and escape MySQL hell. Note that I also intend to migrate off of a Mediawiki installation as well, but the multi-site blog replacement is as much of a pain point as the current Mediawiki is.

Desired Functionality

There are quite a few functions/services that are needed for full-fledged sites, including the following: - robots.txt - sitemap.xml + notification on updates - commenting system - user accounts, including email alerts, password mgmt - page and topic subscriptions - rename/rewrite/redirection on page name change - analytics (GA) - caching - ecommerce - Markdown extra - youtube video lazy load - anti-spam - contact form - quotes collection - widgets - cookie notice - seo metadata (Title, Description, norobots, noindex) - search (usable) - multi-site support -

File Locations

  • /etc/nginx/nginx.conf

Pre-Dokuwiki Installation

Some kind of Web server and some (recent) version of PHP. I used to use Apache and MPM Prefork with Opcache and Php 5.6. As of now it is Nginx with Php 5.6, PHP-FPM and APC Cache. All of these are hosted on AWS, either EC2 or Lightsail (preferably). For full instructions, see: - OpenVPN on Amazon Linux, and - Amazon Linux, Nginx, LetsEncrypt, PHP.

Dokuwiki Installation

Dokuwiki Configuration

Dokuwiki Architecture

Dokuwiki Farms

Important Dokuwiki URLs and Location Info

  • `` -
  • `` -
  • `` -
  • `` -
  • `` -

Limitations of Dokuwiki vs. Mediawiki

Limitations of Dokuwiki compared with Mediawiki, what can Mediawiki do that Dokuwiki cannot: - Dokuwiki cannot read DJVU files and generate images and pages that can then be edited using the Proofread extension. This is a key part of the workflow on Wikisource. I think that's it.

Posted on

Image / Scaling / Compression

Size matters, and the smaller the better, when it comes to generation, modification, transmission, and storage of information. The vast amount of unoptimized documents and images on my very own local storage, much less what we send and receive all the time, is astounding. The idea that we need 100gb or 1tb of storage (thank you Dropbox) is sheer waste and sloth. I've addressed these issues a bit in the past, but it is time to take a bigger picture approach.

Past Articles on Compression

Maintaining Perceptible Quality

The key to the discussion is a focus on quality (relevance being its proxy in the engineering world). Quality is of course in the mind of the beholder, and so we look at whom that is. Generally we are talking about humans on computers and mobile devices, websites and native apps. For a more sophisticated audience we are talking about display and print formats. Yes, generally more pixels might be considered better, but we are dealing with human eyes. For the moment or decade we can put to the side the audience as not (yet) having machine eyes which have learned to see in some way.

Relatively Lossless Approaches

... MORE NEEDED HERE ... (Actual testing) ... Here are some resources to try... - How can I reduce the file size of a scanned PDF file? - PDF Quality when converted - Cleaning up and shrinking a PDF file - Optimize PDF Files


DPI -- dots per inch -- and PPI -- pixels per inch (why not cm?) are meaningful only in relation to a given size (x by y inches), from which one can calculate the digital image size (number of pixels). This is from the world of print, though it now bleeds into digital display as well. Printers and digital platform vendors (e.g., Amazon, Apple, Google, Kobo, Nook) have specific DPI and image pixel size requirements based on what devices and formats they support. A given image may have a DPI setting, but that is metadata only (which is sometimes ignored, even if present -- we're looking at you, Adobe). It is quite simple to change the DPI metadata of an image. There are drag and drop websites for this.

Posted on

Caret vs. Caret – A Tale of Two Editors

Caret the Chrome App vs. Caret the PC App -- not sure which came first, but they are very different (except for the name, and the fact they are open source).

Caret the Chrome App

Note that Caret may possibly replace Atom in my workflow - Caret in the Chrome Store - Caret website - Caret source on Github - Caret wiki on Github

Caret the PC App (Linux, OSX, Windows)

Note that while Caret the Chrome App may possibly replace Atom, Caret the PC App has some great built in Markdown display (it is Markdown-focused rather than general-text-editor-focused). - website - Caret on Github - Caret wiki on Github - Caret on Twitter

Posted on

Grav CMS on Debian

This post will be frequently (or infrequently) updated. It is meant to help me learn Grav and Gravcart, and in particular migrate off of WordPress and Woocommerce.

Related Artices in Debian Services and Applications - Debian on AWS Lightsail - OpenVPN on Debian + UFW Firewall - Nginx and Letsencrypt on Debian - PHP & MariaDB on Debian

- Grav CMS on Debian

Grav, Gravcart vs. WordPress, Woo

WordPress and Woocommerce have such overhead, including dependencies such as MySQL, that it is important to seek out a functional but higher performing option to manage modern websites and web storefronts.

Installing and Configuring Grav

The best approach is to download the Grav + Admin zip file, unzip and move contents to the webroot. I've had issues with using github and composer, so the zip file is a less problematic place to start. ... details to come ... Finally run bin/grav install to get plugin and theme dependencies

bin/grav install

File Rights

I've found that permissions get jammed every now and then. Overwriting them with a script is the easiest approach, as follows:

chown -R www-data:www-data /var/www/WEBROOT
find /var/www/WEBROOT -type d -exec chmod 2775 {} \;
find /var/www/WEBROOT -type d -exec chmod g+s {} \;
find /var/www/WEBROOT -type f -exec chmod 0664 {} \;
find /var/www/WEBROOT/bin -type f -exec chmod 0755 {} \;

Resources for Grav and Gravcart

Posted on

Inkscape – Open Source Vector Graphics

Inkscape is an amazing vector graphics editor. It is free and open source and works on a variety of platforms, including Linux, Windows and OSX. Inkscape replaces Corel Draw and Adobe Illustrator and can read their files, and is a first class citizen among these other editors. > This page will be semi-regularly updated to put my own Inkscape experiences into words. Last updated 19-Mar-2019.

Inkscape on Linux (Debian)

Install Inkscape from flakpak:

sudo apt install flatpak -y
sudo flatpak remote-add --if-not-exists flathub
sudo flatpak install flathub org.inkscape.Inkscape
sudo flatpak update -y

Adjust the shortcut to run flatpak run org.inkscape.Inkscape

Inkscape on OSX

Unfortunately the mainstream OSX release runs on xQuartz which is slow and doesn't support the standard OSX keystrokes and menus. Plus the windowing is not flexible enough. The main branch has continued development while the idea is to get a native release working with Gtk 3, but it is unclear if or when that will take place. For years I have used an old 2013 release from Valerio Aimale. There is now a 2017 release for Inkscape 0.92.2 but it doesn't run on OSX 10.01 (Yosemite), so I am unable to test or use. While s_uv is working on a next version of OSX with Gtk integration (called OSX Menu), it still is wrapped in xQuartz, with the same issues. As of mid-2018 I no longer use OSX, so things may have changed since then.

Inkscape Features and Functionality

  • Inkscape Keyboard and Mouse Reference I use Inkscape as a drawing and illustrating tool and also for editing images in terms of compilations, extraction and svg-ification, logos, book covers, basically everything under the sun. As with any tool, getting efficient with Inkscape is a discovery process with a learning curve. As well, I happen upon a variety of features that continue to amaze, including:
  • Barcode generation: > Extensions > Render > Bar Code
  • etc. Inkscape supports extensions including:
  • Inkscape Map Inkscape SVG files to HTML image map or coordinate list
  • Inkscape Table Support
  • Etc.