Tesseract OCR – Machine Learning

Tesseract OCR is a library and engine for optical character recognition. Version 4.0 has a greater facility for neural network training. The Tesseract Wiki is a good place to start. The Tesseract V4.0 neural network in particular implements an LSTM engine.

Genealogy Tools and Resources

This is a grab bag websites, books, software, and services which have helped in this genealogical endeavor. DNA Analysis FamilyTreeDNA YSeqDNA GEDmatch I started with FTDNA, and it is difficult working with them to do exports, or really anything that is beyond the basic interface. Also, their prices are high. YSeqDNA is cheaper, faster, better, ... Read more

DeepSpeech – Machine Learning

DeepSpeech Speech Recognition Machine Learning These are notes to the project, which seem to me worth pursuing. Having recently seen a number of AWS re:invent videos on Vision and Language Machine Learning tools at Amazon, I have ML-envy. Time to start a project, but while I wait for the Amazon Transcribe and Amazon Translate to ... Read more

Dropbox Cloud Storage and Sync

Dropbox is a cloud storage and sync service, with additional editors/apps, such as Paper and Showcase. For various reasons, those additional Dropbox apps are not useful for our use cases. However, storage and sync are excellent in and of themselves, and generally superior to Google Drive which is the only real alternative. What Dropbox gets ... Read more

Dropbox Paper, Markdown, Sync

Dropbox Paper is a product I really want to like. For one thing, the promise of better editor is something long unfulfilled. And taking some design cues (or perhaps merely unrelated similarities), Medium did do something nice for the blogging environment. By extending it as essentially a wysiwyg Markdown+ editor, drag and drop-friendly, with handy ... Read more

Open Source Collaborative Docs

We can call this Tsuite, inspired by Toot Sweet (a Chitty Chitty Bang Bang candy invention), and is meant to provide some functionality offered through third party collaborative documents. The main point is to have a self-hosted, free-and-open-source alternative, albeit with more bare bones functionality. Ultimately the goal is to be functional enough to allow ... Read more

Machine Learning, Artificial Intelligence

See also Deep Speech, Tesseract Recent Items on ML IBM's Ginny Rommety gave a compelling keynote at CES on AI, as well as answering a great set of questions on Bloomberg Technology. Harari's book 21 Lessons for the 21st Century has some interesting discussion of AI. One thing is that he has a tendency to ... Read more

AWS DHCP Options and Resolv.conf

AWS DHCP options are set on a per-VPC (Virtual Private Cloud) basis. By default, things like the search scope and DNS servers used by a given AWS Instance are set by DHCP which also provides the private IP address (but not any Elastic IP Addresses). Indeed, the Amazon Virtual Private Cloud is a fundamental core ... Read more

Server-Side Analytics and NoJS

The current system of analytics tracking is so very broken, let us count the ways: Most solutions require third party trackers These are easily blocked by third-party ad blockers/privacy tools They are mostly javascript (and javascript can be disabled) They are a privacy nightmare, even when implemented properly They slow a site down by increasing ... Read more

Inkscape – Open Source Vector Graphics

Current issues The latest version of Inkscape has to be downloaded as an AppImage from the website. An earlier version is available via apt. Interoperability with Adobe Illustrator (AI) still has a fundamental issue with AI using 72ppi and Inkscape using 96ppi (CSS standard). Basically any objects in Adobe SVG files will appear to be ... Read more