Posted on

## WordPress – Soup to Nuts

I've written about WordPress at various points. I've been using this cms for 13-14 years, and for me it is well-known, though a bit worn out. The breakage it has has not improved much, and the resources needed are not up to the modern task. Essentially most performance gains are made through improvements in Nginx, PHP, and MariaDB (thankfully, and not inconsequentially). WordPress is a most dreaded platform for 64.5% of developers answering a developer survey on Stack Exchange. This beats out the core enabling technology dread levels of MySQL (50.4%) and PHP (58.6%). Simply put, WordPress has a premium dreadfulness to it. For me it is time for the devil I don't know, rather than the one I do. Even with the Classic Press fork of WordPress, we are dealing with ossified technologies. Granted they will likely not die (the code base is too large), but that does not make them forever bankable and safe, as in the nobody got fired for using IBM of the past.

Posted on

## WordPress 5 – Automattic Waterloo

Automattic is the organization behind WordPress the content management system, wordpress.com, and a number of smaller entities. With some estimates, WordPress has ~30% market share of the web. It has taken on in excess of []$300m in funding](https://www.crunchbase.com/organization/automattic) over the years. After 2–3 years of development of WordPress, Automattic was founded in 2005 to receive an initial funding round of$1.1m.

## Competition and Growth

Competition is seen as foremost coming from the lower-end, simpler website design companies such as Wix and Medium. Basic usability and ease-of-use of the WordPress editor is seen as a stumbling block to growth, especially with investors who seek a return. Matt Mullenweg, the co-founder CEO, is not shy to demonstrate the user problems, as seen in his most recent State of the Word presentation from 10 December 2018: State of the Word — Matt Mullenweg — 10 December 2018 While there is an interesting solution provided in terms of Project Gutenberg and blocks to replace the wysiwig/code view editor, it in no way is an answer to novice users creating pages that have complex visuals (other than possibly copy-paste from Word or Google Docs). More importantly, by removing the current wysiwyg/code view editing interface that all intermediate and advanced users have mastered, everyone is forced into a learning curve regarding these less-than-intuitive blocks. Certainly it is a mental model, as Mullenweg suggests, just not an intuitive one, or one that the interface makes readily apparent. To allow for a transition period (aka Phase 2) the old editor will be available by means of a plugin, and has promised support until 2021. The incipient integration of Gutenberg into Core caused quite a bit of disgruntlement, and induced action on the part of a group to do what is always possible with open source software, and to create a new release from the old source code.

## ClassicPress, calmPress Forks of WordPress 4.9

Strengths can be weaknesses, and the open source software strength of WordPress has now been used against it in the form of hard forks of the project. ClassicPress released its first version which is a fork of WordPress 4.9. Work began on this hard fork on 30 August, with alpha and beta releases on 24 October and 21 November. calmPress, another fork of WordPress 4.9 is the effort of a single developer. calmPress 0.9.9 a fork of 4.9 was released on 29 November 2018, with alpha and beta versions starting back in September. There was discussion about collaboration on a shared plugin directory between calmPress and ClassicPress, but that has not progressed.

## ClassicPress Organizational Development

ClassicPress calls itself a business-focused release. That is, professional, stable, reliable performance. Already ClassicPress is undergoing some performance tuning and a focus on security. The main point is to dodge the bullet of Gutenberg, as with WordPress 5.0 that becomes integrated into Core. Building a successful software project includes proper, effective guidance as well as resources (programming and money). From the ClassicPress forum and Slack channel, these discussions appear to be taking place, and developers are indeed doing the necessary, day-to-day, block-and-tackle efforts.

## WordPress 5 Released

WordPress 5.0 was released on 06 December 2018. On 12 December WordPress 5.0.1 was released to include some security bug fixes. However, this also began to introduce breakage.

## This is a Waterloo

The Battle at Waterloo has become a metaphor for something difficult to overcome, or recover from. With novices unable to easily adopt the new interface, and with a good swath of intermediate and advanced users in open rebellion against the change, there are now opportunities for sharpened knives. The forces arrayed against Automattic are as follows: - Those who will defect to a hard fork (ClassicPress, etc., see above) - Those who will defect to an alternate platform (Grav, etc., see below) The main forces for Automattic are: - User base inertia, - Community that will censor defectors to a hard fork, and - The WooCommerce and subsidiary plugins which make finding a replacement a more complex and time consuming task. (This is akin to trying to supplant Windows without having an alternative to Office.)

## Troop Strength and Depth

While this might seem like a less difficult challenge than the fated Waterloo, the strength of Automattic's development ranks is thin and ragged. The ability to create quality code and a quality experience should be seriously questioned. For example: - Two plugins remain in Core that cannot be touched (for the obviously irrelevant political reason that they were created more than a decade ago by the CEO), and lead developers have to resort to lying about it in the bug tracker. In ClassicPress, those two plugins were removed in the first Alpha release. - The infamous WordPress plugin repository redesign fiasco of 2015–2017. - Last but not least, the hostility to and distaste for Gutenberg to date. If it were a matter of executing and providing a speedy and pleasent experience, then the rather steep learning curve could be mastered. Instead, the very same puzzling experiences found in user testing with novices using the current editor will be found writ large with not only novices, but intermediate and advanced users of the previous platform. As one reviewer put it I'm tripping over my own feet. Again, it will take more than evangelism to win this battle because the quality of the WordPress package, including the ridiculous redesign of the Plugin directory and its functionality. This is not to mention, the antiquated development tools and processes that continue to cause WordPress, like an old jalopy, to rattle and shimmy down the backroads and washed out valleys of bloatland.

## Humans Hate Change

If the above were not enough, there is the very basic psychology that is arrayed against Automatic in this signficant change, which is: humans hate change. Witness: - Why redesigns don't make users happy - Why most redesigns fail

## Alternative to WordPress -- Flat File CMS

It is important to view another issue with WordPress which adds complexity and resource requirements, which for many sites is unnecessary: the requirement for a database. Flat file content management systems are increasingly functional and reliable and have significant advantages over the use of a database. Databases are generally opaque, more difficult to inspect, require their own backup and restore procedures, have their own security, use more resources (specifically ram, but also processor) and with advanced caching readily available, do not have much in the way of benefit. For special uses such as shopping carts and session management, a database can be used as a supplement to a Flat File CMS, but for serving most content, it makes little sense. Grav CMS, a maturing Flat File CMS, is a viable alternative to WordPress for certain use cases, perhaps even the majority (and has shopping cart plugins available). For those developers, administrators, and endusers, like me, who have spent more than a decade with WordPress are are looking for a platform for the next 10 years, Grav looks quite promising, as does ClassicPress. WordPress? Not so much.

Posted on

## This post will be frequently (or infrequently) updated. It is meant to help me learn Grav and Gravcart, and in particular migrate off of WordPress and Woocommerce.

Related Artices in Debian Services and Applications - Debian on AWS Lightsail - OpenVPN on Debian + UFW Firewall - Nginx and Letsencrypt on Debian - PHP & MariaDB on Debian

## Grav, Gravcart vs. WordPress, Woo

WordPress and Woocommerce have such overhead, including dependencies such as MySQL, that it is important to seek out a functional but higher performing option to manage modern websites and web storefronts.

## Installing and Configuring Grav

The best approach is to download the Grav + Admin zip file, unzip and move contents to the webroot. I've had issues with using github and composer, so the zip file is a less problematic place to start. ... details to come ... Finally run bin/grav install to get plugin and theme dependencies

bin/grav install


## File Rights

I've found that permissions get jammed every now and then. Overwriting them with a script is the easiest approach, as follows:

chown -R www-data:www-data /var/www/WEBROOT
find /var/www/WEBROOT -type d -exec chmod 2775 {} \;
find /var/www/WEBROOT -type d -exec chmod g+s {} \;
find /var/www/WEBROOT -type f -exec chmod 0664 {} \;
find /var/www/WEBROOT/bin -type f -exec chmod 0755 {} \;


Posted on

## WooCommerce, WordPress Plugin Sites

There are several sites which sell unlimited access to a large number of WooCommerce and other themes and plugins. These sites are taking advantage of the GPL which allows for free distribution, though it is unclear if they are violating use agreements (or trademarks). In any case, each of these sites has a set of plugins, many are on all sites, but not all. - WooTheme Plugins $99 USD/year (best yearly price) - gplChimp$ 15 USD/mo - Lots here - Effectio $15 USD/mo - Some stuff here, too - Sozot$ 15/mo - Nice collection of Woo, other plugins, and themes - Null.Market $2/mo for certain plugins,$11/12 months. Best prices (for paid sites) There are also a few free sites, not sure if they have done something to the code. - gplpro free and paid memberships - dlwordpress - null24 - gplfox - gpldl - Seylin - wpFree To test drive a bunch of themes or plugins, either use the free sites or get gplChimp (which has a few items that Effectio does not), and then if updates are needed more than 3x per year, move over to WooTheme Plugins. Personally I find that gpldl is very full featured for registration-only, no cost.

Posted on

## WordPress Multisite on Amazon Linux

This assumes a current configuration of: - Amazon Linux (6.x RHEL series) - Apache 2.4 - PHP 5.6 + Opcache - Oracle MySQL 5.7 Installation up to this point is encompassed by: - OpenVPN on Amazon Linux EC2, basic configuration and securing an EC2 instance - Amazon Linux, Apache, MySQL, and PHP, installing and configuring

## Install WordPress from Subversion

This is the standard quick install. It is advised to do the most recent stable version, and not the main branch, which can break (more) things. First, install subversion:

yum -y install svn


For Debian:

apt-get install -y subversion


Visit Installing WordPress with Subversion, and look for a command line that looks like the following.

svn co https://core.svn.wordpress.org/tags/5.0.2 .


The final number will change over time. Currently the options for GIT are a bit malnourished.

## Create Database and User

sudo mysql -u root -p


Create database

CREATE DATABASE database DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci;


CREATE USER 'user'@'localhost' IDENTIFIED BY 'password';
GRANT ALL on database.* to 'user'@'localhost';
flush privileges;
exit;


## Create wp-config.php

First, copy the sample file into a config file

cp /var/www/html/wp-config-sample.php /var/www/html/wp-config.php


Next edit the four parts of the file:

nano /var/www/classic/wp-config.php


Change these: - Database Name - User Name - Password - Table prefix Also add the following at the end

/** to set update method, rather than changing file access */
define('FS_METHOD','direct');


Save and restart /index.php

## Multisite

WordPress Multisite has advantages (and some disadvantages). The process to change a single site into multisite has several steps. - Disable all plugins - edit wp-config.php to include the following

sudo nano /var/www/html/wp-config.php


/* Multisite */
define( 'WP_ALLOW_MULTISITE', true );


Note that this will then allow you to take the next steps. - Administration > Tools > Network Setup - Configure for subdomains - Once completed, copy the text for .htaccess into httpd.conf (usually this redirection is safe for single site domains as well.

RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$- [L] # add a trailing slash to /wp-admin RewriteRule ^wp-admin$ wp-admin/ [R=301,L]
RewriteCond %{REQUEST_FILENAME} -f [OR]
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^ - [L]
RewriteRule ^(wp-(content|admin|includes).*) $1 [L] RewriteRule ^(.*\.php)$ \$1 [L]
RewriteRule . index.php [L]

• Comment out the above item entered into wp-config.php, and instead replace with:
define('MULTISITE', true);
define('SUBDOMAIN_INSTALL', true);
define('DOMAIN_CURRENT_SITE', 'host.domain.com');
define('PATH_CURRENT_SITE', '/');
define('SITE_ID_CURRENT_SITE', 1);
define('BLOG_ID_CURRENT_SITE', 1);
define( 'SUNRISE', 'on' );


Note: It is very important to place this where it indicates, just before stop editing below - copy the sunrise.php file to /wp-content/. - restart Apache - Install and enable WordPress MU Domain Mapping - change the settings in > Network Admin > Settings > Domain Mapping to 2,5 (the opposite of the default) - Add domains to the mapping as desired - Set redirections and site defaults to their desired domain name

## Reset Filesystem Security Script

Filesystem security can get wonky especially with WordPress plugin and theme updates and manual file copying and editing. There are two things to do: - Make a script that backs up essential configuration files - Make a script that resets all the security in the file paths This is an example of the second:

chown -R username:apache /var/www
find /var/www/html -type d -exec chmod 2775 {} \;
find /var/www/html -type d -exec chmod g+s {} \;
find /var/www/html -type f -exec chmod 0664 {} \;
chmod 700 /var/www/html/.b*
chmod 1700 /var/www/html/.ssh
chmod 600 /var/www/html/.ssh/authorized_keys
echo ' ';
echo '***************************************************';
echo 'changed ownership and security on wordpress install';
echo '***************************************************';
echo ' ';


## PHP Session Handling

WordPress does not use PHP Sessions, and plugins need not, therefore: - Eradicate plugins which use @session_start(); which includes (as per latest scan): - wp-affiliate-platform, - wp-spamshield, - woocommerce-amazon-s3-storage, and - php-compatibility-checker (which is only needed for testing, in any case)

cd /var/www/html
grep -r 'session_start'


## Caching Configuration in WordPress

### W3 Total Cache

General Settings - Page Cache, Disk: Enhanced - Minify (disabled) - Database Cache, Disk - Object Cache (disabled) - Browser Cache (disabled, we do this manually in httpd.conf) - CDN (disabled) - Use single network configuration file - Purge Policy: Posts page, Post page Page Cache - Cache posts, SSL, Don't cache logged in - Prime page cache, 900, 10 - Preload post cache upon publish - Sitemaps regular expression [a-z0-9_\-]*sitemaps\/[a-z0-9_\-]*\.(xml|xsl|html?)(\.gz)? - Rejected Cookies:

wptouch_switch_toggle
ap_id
cart_in_use
eMember_in_use

• Never Cache the Following Pages
wp-.*\.php
index\.php
[a-z0-9_\-]*sitemap[a-z0-9_\-]*\.(xml|xsl|html?)(\.gz)?
favorites\.php
cart
checkout
shop
/shop*

• Note: must include any changes to permalinks and the pages above Database Cache
• Don't cache for logged in
• Ignore Query Stems
gdsr_
wp_rg_
_wp_session_
_wc_session_


### Autoptimize

• Optimize HTML, Keep HTML Comments
• Optimize Javascript, aggregate inline JS
• Optimize CSS, Remove Google Fonts
• Save aggregated as static files = uncheck

## Code Cleanup

A good part of speed issues is the actual site code (php/js/css/html) and when it comes to WordPress, especially WordPress plugins, there are a lot of potential conflicts. Blocking JS and CSS is a big part of the problem, as well as removing all the default crap that is not needed (such as various webfonts). - Clean up nonblocking Javascript and CSS - Too many CSS files and embedded CSS in HTML, and too many JS files - Google's Accelerated Mobile Pages - Cache-aware websites

Posted on

## Math on the Web

Below are bulletted list items. Later this will turn into better copy. - Summary: Use Texvc when necessary, and KaTeX when possible.

## Math in Chrome

• SVG is the preferred method for Math on Chrome, though accessibility is still an issue
• Basically, for Chrome, the options are: ship small images at a great expense in time (MathJax), render as html on the server (KaTeX), or render as html on the browser (KaTeX), or finally ship images out of Texvc (faster than MathML)
• Chrome does not support MathML

## Texvc

• Texvc is a PHP Server-side extension that works/worked on MediaWiki at least 10 years ago
• Basically Texvc will short-circuit the MathML and deliver images instead (not very usable or accessible, but it works better than MathML)

Posted on

## MediaWiki vs. WordPress

There of course is no MediaWiki vs. WordPress in the sense of a battle. As Wiki and Blog platforms go, each is the winner in their category in terms of raw number of users/pageviews. That said, there are definitely (different) concerns with each platform, architecturally as well as accidentally. And therefore, we dreg up the battle metaphor. To the fighting pits!

## Markdown vs. Wikitext

Markdown isn't the default in WordPress, indeed there is way too much emphasis on the visual editor. That said, Markdown is common and available via plugins, and shortcode functionality is also prevalent. For MediaWiki, the wikitext markup remains dominant, to the exclusing of Markdown. But no where else than MediaWiki is wikitext deployed.

## Namespaces and Transclusion

MediaWiki namespaces are ways of organizing kinds of documents (sometimes without much real effect other than naming), as well as allowing for transclusions and templates. For WordPress templates, or better custom post types, are monolithic and govern an entire page of a certain kind (for example, products). While custom posts and templates are distinct, and there can be more than one template for a given custom post type, they essentially are managed as an area for programming, vs. the looser, and easier to edit templates (powered by WikiMedia transclusion extensions), so that moderately capable editors can customize the look and feel of pages without needing administrative access. This gives MediaWiki a more democratic and flexible system, that however ends up creating an additional level of administrative editing work. I've you've got millions of editors, this is fine, and necessary, but if not, it becomes more difficult to manage.

## Caching Techniques

MediaWiki has some built-in caching, and for WordPress this is the domain of plugins. Still, these sit on top of PHP, MySQL, and Apache, so the caching strategy is the same.

## Themes and Skins

Themes in WordPress are where the look and feel, layout and design live, while for MediWiki these are skins. As with most things CSS (and a little javascript), the customization can be extensive. The trouble with skins, besides the fact that most are very ugly, is that the paradigm of the Wikipedia page generally dominates. Wikiwand has gone far to beat back that design, and done so effectively.

## Templates, Templates, Templates

Templates tend to grow like mushrooms. For example this page has 53 templates. There are mainly just a few template types: - Page or fragment formatting templates - Info-box style templates - Weird parsing or inclusion templates From an architecture perspective, this is obviously nuts (a technical term). First off, getting down to the root of it, there should be widgets, templates, and plugins. While certainly it is convenient that this is the WordPress model, the reality is that not managing these issues site-wide is a recipe for disaster. One ends up with... 53 templates. Bartleby the Scrivener, indeed.

## Javascript and CSS

For WordPress, including javascript and css is generally straightforward, and there are plugins such as the masterful HeadSpace which makes insertion of includes straightforward. In comparison, MediaWiki's approach doesn't always work very well. There are the common files, but adding includes is not obvious. Wikia documentation helps out (but again, is incomplate). Technical documentation of MediaWiki is by far the weakest and most troubling part of the distribution. Technical documentation is either non-existent, incomplete, or out of date -- usually a combination of all three.

## MediaWiki and WordPress - Deep in Technical Debt

The attempts to make sane improvements to MediaWiki and WordPress (and most recently, WooCommerce) have exposed an enormous amount of technical debt. MediaWiki makes mention of this, but their attempts to address this are essentially don't touch what we can wait to touch later. WooCommerce 3.0, in less than a month, has released six bug-fix patches, having broken a huge amount of their customer base. The insanity continues on WordPress releases, which no longer have timelines (only some kind of undefined feature release rationale).

## A Revisit for Sanity

Both MediaWiki and WordPress have extremely poor core technology stack, and while it can be made to work and scale, the process is generally painful. In addition, with core version control distributed collaborative editing and website display, there are few reasons not to build something that can fix both of these problems, provided the core functionality of both applications is built first, and the architecture is thought out better. This should fix speed issues and caching issues.

## Challengers and Replacements

Because part of the issue has to do with the database requirements, flat file systems have a distinct advantage. Additionally, active, full-featured projects that are able to do some kind of migration/import, are a strong consideration. Two in particular are: - WordPress / Woocommerce --> Grav / Gravcart - Mediawiki --> Dokuwiki

Posted on

## Sclerotic Teens – WP & MW

Two very popular content management systems are in their teenage years now: WordPress will be 14 this year, and MediaWiki will be 15. Those are a lot of years on the web. As teenagers, these two successful and interesting projects try and act like the adults they want to be. Unfortunately, this can lead to an obsession with trying to live the high-life, and model themsleves after their adult role-models. Sadly, many teens don't have great choices in whom they look up to, adulate, and emulate. Teens tend to try and grow up too fast and focus on the wrong things, get obsessed with fads, and judge themselves by external factors that have nothing to do with their own success, or happiness.

## Teen WordPress

WordPress is very chatty, focused on a visual editor, and real people (by that they mean not nerds). But nerdy cool is what got this teen to where they are. In fact, using nerds to do the design work gets you, well, something not very elegant, nor usable, and half-broken. Read the scathing report (and deep background) on the WordPress Plugin Repository Revamp. And so while WordPress has nerd roots, these are not taken very seriously (other than letting the actual nerds try and figure out how to engineer and project manage what they don't understand), and horribly outdated development workflow is still the norm. In addition, plugins like Hello Dolly are still the default install. This is because Automattic is run by teenager.

## Teen MediaWiki

This teen still has deep rooted editor and uptime bias. That's a good thing. But let's look over one of the technical teams. It appears that reader has many designers and two programmers? And the parsing team, that is the only one that can actually help extend most extensions, including reformatting for different formats (such as ebooks), few resources indeed. Here is a great example of what is wrong, namely lots of academics with no sense of urgency. A Summer of Code contributor from 2013 has already implemented annotation functionality, and we get a group of deep thinkers coming up with... nothing. The wrong people are being hired at Mediawiki, certainly in terms of important functionality like basic annotation. Instead, MediaWiki is focused on growing an organization (not improving what they offer, or the user experience). This is again the same teenage obsession with something other than what one is doing right now. One obvious problematic for MediaWiki is the visuals. This is odd for a teenager to not tune in to. Wikiwand, a 2013 startup which is now a team of 10 has produced a skin for Wikipedia and catapulted themselves to a ranking of 1,500 on the web.

## User Experience

There is so much to improve in the core user experience of both of these successful products (which I deal with daily, as a user and sysadmin). Indeed, this gloss is in need of much more detail, but it is a start to say that focus has been lost, and in not a good way. Money and success can cloud judgment quite easy, especially to the teenage mind.

## Sclerosis

The danger here, beyond simply misspent youth, is that a sclerosis is setting in regarding basic features. That is, meaningful change (technical, organizational) is becoming out-of-reach for these crazy kids. Opportunities are being missed.

Posted on

## WordPress Plugins Redesign 2017

If a phrase could sum up what we've seen so far in WordPress in 2017 it would be something like: > Bureaucratic nonsense, shitty design, tone deaf development. The Core crew and their work on Plugins has recently turned from tragedy to farce. While these hardworking plebes have put in the hours, their result is, frankly, pathetic. They couldn't do much worse if they tried. They've broken search (which was crap to begin with, worse now), and their design is optimized for an ipad, nothing bigger or smaller. At every turn there has been scorn placed on requirements and suggestions proposed by developers, because, well, they are developers! And damned be the developers. (It has a kind of anti-Microsoft ring to it.)

## Plugin Repository Redesign Fiasco

Before we start in on WordPress, and search, it is useful to understand my own background regarding WordPress, and search. I've used WordPress regularly since Kubrik (v1.2, 2005). My experience with the web, and search, was earlier. From 1999-2001 I took courses in old school library science, information science, newly spawned information architecture (a combination of library science and human factors, rebranded user experience), and the like. I graduated in 2001 with an MS in Information Management and Systems (a unique degree name, our gown color was from the more established MS in Information Science degree). What brought me to the attention of the program, then named Information Management and Systems, hence rebranded the School of Information, was the book Information Rules by Carl Shapiro and Hal Varian. Hal had become the Dean of the School. I applied and two years later was a part of the third graduating class. (Hal later left and became the Chief Economist at Google.) The school is the newest (and smallest) school at Berkeley. However, its roots at the school are ancient, and it is situated in South Hall, the oldest building on campus, constructed in 1873, and original home of the first Physics laboratory in the United States. Essentially it is a re-organized library school, with new faculty hired with dual appointments at several different schools across campus, including the Law School, Computer Science, Engineering, Economics, Public Policy, etc. This provides a necessary interdisciplinary and multidisciplinary orientation. When I attended, it was as a former Berkeley grad in Interdisciplinary Studies Field, with a focus on literature and philosophy, who had become a network engineer in industry, over the previous five years since graduation. My interests were not so much with the data networking I had been doing, but the burgeoning startup scene (and incipient collapse). I was interested in programming, and developed some skills in that, but it turns out that the various courses available were intrinsically interesting and they provided a basis of modern education I use today, including: - Product Design (Mechanical Engineering course) - Intellectual Property (Law School course) - Internet Law - Information Classification - Usability and Interface Design - Information Retrieval&ast; - Library Services&ast; &ast;This last one was a surprise to me, and actually I don't recall why I took these old school courses from Michael Buckland, which turned out to be the most relevant, not least of all because these drive search on the Internet.

## Information Retrieval and the Found Set

The two basic concepts, exploited to great effect by Google, is based on the human concept of relevance. Relevance is always what is relevant to a given searcher with a particular information need. Again, this is a human concept, and therefore can only be approximated by a machine, which ultimately needs human judges to evaluate its effectiveness at relevance approximation. The human judges in terms of relevance, it turns out, are retired CIA analysts. The basic (human) search is as follows: given an information need, and a set of results (documents), which of those documents are relevant to the information need. Before the use of computers, there was a mechanical use of cards (which held metadata about certain documents). These cards would have holes punched in them at various places in two dimensions. Those holes corresponded to certain categories. Rods could be inserted in these holes through a set of cards. The cards that stayed connected to the rods were the relevant ones, and those that were not were called the dropped set. The initial categorization via hole-punching was replaced by a vector-space model determining the content (category, keyword) relevance of a given document to a given query. Conceptually this is still the same, though the algorithms are much more complex these days. And so, what is important is what can be known through metadata (title, description, age, etc.), document structure, and the content of documents (words, phrases, word count, other patterns). There is then a matching of a search term with related documents. This again is the found set, as above with cards.

## Relevance Ranking and Signals of Eminence

Once a found set is known, the question comes (in an age of an abundance of information, but a deficit of attention) to ranking. For scientific journals, ranking and impact analysis was driving by work in citation analysis by Eugene Garfield in the 1960s and 1970s (but first posed in the 1950s), and enabled by increasing statistical analysis done by computers. Citation analysis across articles could attribute the impact of a given journal to what was published there. This meant that future articles in a given journal would have a higher or lower probability of citation, but also that it could clearly indicate which articles themselves where more more relevant. Google's Larry Page Rank derives directly from this, in terms of a link being a citation. There is obviously more noise (and opportunity and incentive for link-fraud than in scientific publishing), but the basic correlation remains. A large amount of variance in search results ranking is (still) explained by the number of domains linking to a given URL on the Internet, with attenuation based on the quality or authority of the linking domains.1

1. SEO Ranking Factors - Web Page FX (2015)

The thing about Google is that it not only has inputs, but can determine based on human behavior in clicking, what kind of modification of the initial results should take place. This is an extremely dynamic situation, where clicking on results, the use of the back button, and subsequent repeated searching can provide evidence of less relevance. Of course, and in addition, personalization is important and useful.

## Simplicity of Search vs. Wall of Browse

Google's rapid popularity, when faced with incumbents such as Yahoo! with hundreds of humans (doing effectively the same CIA Analyst task previous to mechanization, computerization, and digitization), was clearly due to in some sense mastering relevance and deploying superior search algorithms. In particular, long-tail searches were famously rewarding, while short-head searches still occupied, well, the short-head (large search volume). Google could do well for both kinds of searchers, and those in-between. The I'm feeling lucky button meant to show that the number one result was within easy reach of most, yet slowly expanding searches were well supported. Additional parameters for constraining searches to particular file types, searching within a given domain, and date ranges helped increase the tools available.

## Wither WordPress 2017

Faced with two decades of Google's search effectiveness and public facing search tools, WordPress began a project to revamp the search interface and search algorithms for their WordPress plugins. The first Plugin page revamp was rolled out in 2015. From the comments it becomes clear that things are being broken that people use. A 2016 Plugin Search prototype released to the public garnered the same kind of response, namely lots of things wrong with the new design, breaking things that worked before, minimal improvements, and generally poor reception.

## WordPress Plugin Search Algorithm

Besides the user interface, adding, removing and rearranging various bits, there is the basic algorithm. Obviously, as per the previous history, it is important (or rather, the goal) to approximate human relevance. Relevance has to do with searcher intent, which is itself approximated by search terms, and searching behavior (clicking, back button, searching again). So what does the WordPress team do in terms of ranking? Well, it makes the Last Updated date (more specifically, the tested with WordPress version) as a huge ranking factor. But this feature is one the community has no say over (meaning, it is not a feedback feature from actual behavior), and it is the easiest one to game (change a bit of text, resubmit plugin, repeat after each WordPress version release).

## Ageism at WordPress

For Google (though indeed, it gets it wrong sometimes) age is a positive ranking factor. WordPress ignores this completely, and puts last updated as the only age-related factor, essentially the opposite of Google.

## Exact Match at WordPress

> We no longer have the exact match search in place. The new search is more relevant to current events. If you don't maintain it, it will fall out of rank. -Samuel Wood (Otto) No exact match. Really. Actually. Honestly. > This is a huge problem for people looking for an exact match, that is, they know the name of the plugin. I searched for Post Tags and Categories for Pages and it came up on the 5th page of results. I guess I should count my blessings as there are 163 pages of results for that query. 163 pages! If someone knows the name of the plugin (who cares if it is 2 years out of date, the plugin still works and I use it on multiple sites), but can’t get an exact match, just exactly how are they supposed to find what they are looking for? > > More relevant to current events shouldn’t destroy the relevance of historical events.2

2. New Plugin Directory Mostly Live (Make.WordPress.Org 03-2017) The main response to this search problem that has been introduced, was simply repeating the demand that a new version and new tested to WordPress version be added by the developer. But this is a bureaucrat's argument (the plugin developer has not updated the form correctly, therefore your request for their plugin is not legitimate). > If the last update date is all that matters (stellar reviews, large numbers of active installs, and exact match text matters little), then the preferences and activity of the community are shown no respect. The community does not have control over when a plugin developer can/will update a plugin, but they have control over the other factors (that is, installing, using, rating positively, and searching for a plugin by name). This is not an edge case, as there are many plugins in this situation. They should not be penalized. They should have exact match text respected, along with the other community factors.[^New-Plugin-Directory-Mostly-Live]

## Tested Up To vs. Works With

There was a useful feature which allowed users to vote as to whether the most recent plugin version was working with a specific release of WordPress, or not. It was a simple does/does not work and a WordPress release version drop-dodwn. It was community-driven information. And for some plugins (either newly updated, or not newly updated) it provided some information (though of course it could be innaccurate). Still, it was something, and there are many reports that the information helped. That feature was removed. Now, when users complain about that feature being removed, they are told that people didn't use it (of course they did, you are hearing from them). And they are then told that the plugin developer should update the metadata of their plugin showing Tested up to. This is purely bureaucratic thinking. But it is a valuable question as to who benefits from this situation. It can only be developers who want to promote their plugins that are otherwise being eclipsed by not or not-often updated plugins (usually the older ones).

## Support Forum Use Requirements

If a support forum is ignored by a developer, woe unto them. Regardless of the relevance of support topics, if they are not managed quickly, and marked resolved, then that will impact the search rank. Again, more bureaucratic thinking based on rule-enforcement. These kinds of ranking signals can only be considered legitimate by other developers who wish to penalize those who do not follow their (the WordPress Core developers') rules. First, not everyone knows how the support forum works. In many cases you see developers for whom English is a second language, forced to deal with questions in English (and with a Support Forum in English). Secondly, there are plugins like Contact Form 7 which as of 30 March 2017, has 141 out of 703 issues resolved in the last two months. CF7 is very well known, but the usefulness and use of the support forum for this, and other plugins is minimal at best. The Support Forum for WordPress Plugins has always been a mixed bag. Some plugins simply have no one watching the support forum so there is no information (except perhaps that someone else might have the same problem). Sometimes other users help answer, but they have no ability to mark the issue resolved, as only the issue creator and the developer have that ability. In many cases, developers state directly that they have a support forum at another location and to address questions there (which of course some people ignore and post questions anyway). Support Forum use should not be mandated, or used as a signal for relevance. Rather, it should (if kept) be optional to use, and either toggled on or off, with an optional URL pointing to an off-site support forum or simply an email address for support.

## A Plugin Repository is not General Search

Certainly, search in a plugin repository is not general search, but rather should be a more organized, faceted classification (tags and categories). But this is not the case. Rather, this kind of metadata is not organized, but ad hoc, and determined by developers alone. In the latest version, there is now a limit on tags (up to five), but where did that come from? From the algorithm folks, who prefer to have fewer signals to deal with (and no actual user behavior involved).

## A Lesson from An Electronic Cultural Atlas

To round out the UC Berkeley Ischool reminiscences, I recall vividly a talk by Lewis Lancaster, possibly the Platonic ideal of a gentleman scholar. He was someone people would do anything for, a truly magnanimous and gifted scholar. He saw clearly, decades ago, the need for interdisciplinary research in the humanities, and that technical skills are needed alongside that of the historian and cultural researcher, in order to present information in a way that provides original insight. Digitization is but the first step, information retrieval and visualization meant to show things that are otherwise hidden, that is the real feat. To which end Lancaster founded the Electronic Cultural Atlas Initiative. The problem with WordPress Plugin development, that is clear to even the most cursory evaluation: > It seems like backend developers create such a design. Boring minimalism. ?3

3. Plugin Repository Redesign Beta Available (TorqueMag - June, 2016) This is the heart of the failure (or the failure of the heart of it), is that there is no cross-disciplinary team involved. There are essentially programmers doing the work that should be done by experts from other fields. The provisioning of human relevance needs more than PHP and CSS code wranglers. It will continue to fail without the fresh air of authentic collaboration with, and leadership from, those who are not developers first and foremost. Facebook and WordPress interfaces are similarly bad, based on the same kind of DNA: Run by founders who are at base PHP coders, driving an engineer-centric culture, whose insular thinking continues to produce an unremarkable product (whose minimal progress occasionally goes in reverse).

## Where to Begin Again, The Appstore

To begin again would be to first take people's lived experiences as a place to explore and respect, and their suggestions, to follow. Doing a redesign is all very lofty, but how about fixing what can be fixed easily first. For example, plugin screenshots have no lightbox, something simple to fix. Sorting and filtering search results by certain criteria, something asked for, for years, and ignored yet again. Second, orient toward the kind of search that makes sense for a plugin repository, which is an appstore. Appstore search is difficult, as the atrocities that are Apple and Google demonstrate. However, there are some basic features that make sense: collections, categories, and the app (plugin) display page. These are all huge, low-hanging fruit for WordPress plugins. But of course, not sexy like search algorithms, to a back-end developer, so they play with search algorithms, while the user experience languishes and ticks down further. For people to take the plugin store seriously, WordPress needs to take it seriously. In the face of requirements for plugin developers to regularly update their plugin (even when it is not needed) and babysit the required support ticket system, the ongoing force-feeding of a badly out-of-date plugin -- Hello Dolly -- is simply a joke.

Posted on

## WordPress Form and Comment Spam

As with security in general, escaping the scourge of WordPress form and content spam requires a layered approach. Here is what works.

## Databases and Behavioral Anti-Spam

The first step is the one that nowadays works the least well. In the beginning we had Akismet, and things got better, but this is an arms race, and Akismet has not been getting better. In WordPress, this battlefront has basically been ceded (with some exceptions, below). For things like Google's Gmail, this still works fairly well (along with manual rules), a vast majority of the time.

## Manual Rules and Keyword Blocking

Manual rules and keyword lists help block a particular subset of spam, namely that manually created by humans, with the purpose of pestering someone to hire a so-called SEO Expert, Web Designer, or Marketing services. By placing these highlighted keywords in the WordPress Admin > Settings > Discussion > Comment Blacklist field, they are not only used as a filter by the WordPress commenting function, but also used by Contact Form 7.

## Javascript and/or Session Detection

For the average bot, which is fairly simplistic and won't accept session cookies or have javascript enabled, testing for one or both of these conditions will generally allow those to be ignored. For Comment Spam (on sites that must have comments enabled), the WordPress plugin WP-SpamShield is a fairly effective option. In the future, it might be better to ensure no plugins do PHP Sessions, for performance reasons, but on a moderately busy site this shouldn't be much of a problem.

## Honeypot Form Fields

Another way to detect bots is to provide form fields that they see but that humans do not (via CSS). Bots will attempt to fill out these fields, and thereby have their submissions identified and silently rejected. For Contact Form 7, a good choice is the aptly named Contact Form 7 Honeypot. For WordPress account creation/registration, there is the Registration Honeypot.