honeypot – Jerome Wincek

This is the last post in a series about plugins I’ve built for the Venango County Humane Society. The previous five posts covered shelter-specific WordPress work — adoptable pet listings, a Petstablished sync engine, a donations platform, a recurring events manager. All of them are specialized for nonprofit use cases; none of them would make sense outside of that context.

This post is about the one plugin in the stack that has nothing to do with animals, donations, or shelters. It’s a spam plugin. It’s called Simple Spam Shield, and the only connection to the shelter is that it was born from a shelter problem.

Here’s the problem. Within weeks of being posted, the shelter’s volunteer application form — built using Jetpack’s Contact Form blocks — started receiving thousands of fake submissions. Casino spam, pharma spam, SEO link dumps, the usual. So many that I had to wade through hundreds of pages of garbage to find real volunteer applications. We were using WordPress blacklisted terms at the time.

The obvious answer was Akismet. It’s the default WordPress spam
solution, it’s free for personal sites, and it works well. But it sends every form submission to Akismet’s cloud service for analysis, which means every volunteer applicant’s name, email address, and personal information passes through a third-party server. For a nonprofit that handles sensitive community data, that felt wrong. The alternative — installing a commercial anti-spam plugin with a $50/year license — felt like the same paywall dynamic I’d been building around in the other plugins.

So I built a spam shield. No cloud. No API keys. No external
dependencies. Everything runs locally on the WordPress server. It
protects comments, WooCommerce product reviews, and Jetpack Contact Forms using a small bundle of proven mechanisms that don’t require phoning home. It’s been running on vcpahumane.org for a few weeks as the site’s only spam protection.

The repo is at github.com/jwincek/simple-spam-shield. GPL, zero-config out of the box, designed to be useful to any WordPress site that wants spam protection without a cloud dependency.

What “simple” means

The name is deliberate. “Simple” doesn’t mean the code is a single
file — it’s a few thousand lines of PHP with config-driven guard
definitions, a normalized data pipeline, and per-surface integration classes. “Simple” means the feature selection is small and focused: seven spam-detection mechanisms, three protected surfaces, one settings page, one log viewer. No machine learning, no remote API, no cloud dashboard, no premium tier.

The thesis is that a handful of well-implemented local checks —
honeypot fields, time gates, duplicate detection, nonce validation, link counting, keyword matching, and optional behavioral scoring — catch the vast majority of automated spam without needing to send user data off-server. They don’t catch everything. A determined human spammer will get through. But for the kind of automated form-crawling that floods small sites, they’re effective and they preserve privacy.

The guard pipeline

The plugin’s core is a weighted, short-circuit pipeline called the
Guard_Runner. Seven guards are defined in config/guards.json, each with a weight that determines execution order. The runner sorts guards by weight (highest first), initializes them, and runs them sequentially against the submission data. The first guard that fails blocks the submission — no further guards run.

Guard	Weight	Default	What it does
Honeypot	100	On	Hidden form field that bots fill in, humans don’t
Duplicate	95	On	MD5 hash of content + author + email + IP, checked against a 60-second transient window
Time Gate	90	On	Rejects submissions faster than 3 seconds after page load
Nonce	80	On	Standard WordPress nonce verification
Link Limit	70	On	Rejects submissions with more than 3 URLs
Keyword Block	60	On	Case-insensitive matching against a blocklist
Behavioral	55	Off	Scores mouse movements, clicks, and time on page

The weight ordering is intentional: cheap checks run first. The
honeypot is a single empty-field check — essentially free. The
duplicate guard is a transient lookup. The time gate is arithmetic.
By the time the runner reaches keyword matching (which involves
string operations against a list), most bot submissions have already been caught and rejected by a faster check.

All seven guards implement a shared Guard_Interface and extend Abstract_Guard, so adding an eighth guard is a matter of writing one class and adding an entry to guards.json. The pipeline doesn’t need to know about the new guard’s internals — just its weight and whether it’s enabled.

Three surfaces, one normalized layer

The plugin protects three different form systems, each with its own submission lifecycle:

WordPress comments — intercepted via the preprocess_comment filter at priority 1
WooCommerce product reviews — intercepted via
woocommerce_new_comment (reviews are stored as comments but go through WC’s own flow)
Jetpack Contact Forms — intercepted via
jetpack_contact_form_is_spam filter

Each surface has a thin integration class in includes/integrations/ whose only job is to normalize the submission data into a common format — content, author, email, plus any JS-injected fields (honeypot value, nonce, timestamp) — and pass it to the Guard_Runner. The guards never need to know which surface the data came from.

This is the same pattern as the Petstablished sync plugin’s abilities layer: a normalized interface between the outside world and the core logic, so the core logic stays clean and testable regardless of how many input surfaces you add. If someone wanted to add protection for Gravity Forms or WPForms, they’d write one integration class, normalize the data, and the existing seven guards would apply automatically.

The Jetpack problem (and the two-phase workaround)

Jetpack Contact Forms are the hardest surface to protect, and the reason is worth documenting because other plugin developers will run into the same wall.

When a Jetpack form is submitted, Jetpack’s processor recognizes only the fields defined in the form’s configuration. Any extra fields that the plugin injects — the honeypot field, the nonce, the timestamp, the behavioral data — are silently stripped
from $_POST before Jetpack’s spam filter fires. By the time the
plugin’s jetpack_contact_form_is_spam filter runs, the JS-injected guard data is gone.

The solution is a two-phase pipeline:

Phase 1 runs before Jetpack processes the form. At this point,
raw $_POST still contains the injected fields, so JS-dependent
guards (honeypot, nonce, time gate, behavioral) can check their
data and set a rejection flag.

Phase 2 runs during Jetpack’s own spam filter. If Phase 1 already flagged the submission, it returns immediately. Otherwise, it runs content-based guards (keyword block, link limit, duplicate) against Jetpack’s structured form data — which is available at this
point even though the JS fields aren’t.

The guards themselves handle the edge case gracefully: if a guard expects a field that’s missing (because Jetpack stripped it), it skips rather than hard-failing. This means even if Phase 1 doesn’t fire for some reason, the content-based guards still protect the form. Defense in depth, with graceful degradation.

This is the kind of integration problem that doesn’t show up in
documentation or tutorials. Jetpack’s field-stripping behavior is
undocumented and invisible until you try to inject custom fields
into a form submission and watch them disappear. If you’re building any plugin that needs to intercept Jetpack form data, plan for this.

The keyword list problem

Let me be honest about the plugin’s current biggest weakness: the default keyword blocklist has seven entries.

‘casino, poker, viagra, cialis, crypto airdrop, free money, click here now’

Seven words. That’s the out-of-the-box protection against
keyword-based spam. The other six guards (honeypot, time gate,
nonce, duplicate, link limit, behavioral) don’t depend on keywords
and work fine, but keyword blocking is the guard that catches
content-aware spam — the submissions that are crafted to look
human-like but contain telltale phrases. Seven keywords is not
enough for that.

The reason it’s seven is that I’ve been cautious about false
positives. Every keyword added to the default list is a keyword
that could block a legitimate submission on someone’s site I’ve
never seen. “Casino” is safe — no legitimate volunteer application
mentions casinos. But “free” alone would block real content. “Buy”
would block WooCommerce review discussions. The default list needs to be universally safe, which means it needs to be conservative, which means it’s small.

This is the single most useful contribution someone could make to this plugin right now: a curated, well-tested default keyword
list that’s aggressive enough to catch common spam patterns but
conservative enough to avoid false positives on typical WordPress
sites. If you maintain a WordPress site and have access to your
spam folder, the phrases in there are exactly what this list needs.
Open an issue, paste the patterns, and I’ll merge them.

The admin can add site-specific keywords from the settings page
(it’s a textarea, one keyword per line), but the defaults should
be good enough that most sites don’t need to touch them.

Allowlisting and the privacy model

Before any guard runs, the pipeline checks the submitter’s IP
and email against an allowlist. The allowlist supports exact IPs,
CIDR ranges (e.g., 10.0.0.0/8), exact email addresses, and
domain patterns (e.g., @trusted.org). Allowlisted submissions
bypass all guards entirely.

The broader privacy model is simple: no data leaves the server.
Form submissions are checked locally. Blocked attempts are logged to a custom database table on the site’s own database. The log captures the guard that triggered, the reason, a content excerpt, the IP, and the user agent — enough to diagnose false positives, not enough to build a surveillance profile.

The log can be disabled entirely from the settings page, and
uninstall.php drops the log table and deletes all plugin options and transients. A clean uninstall leaves no orphaned data behind.

This is the part of the plugin I feel most strongly about. Spam
protection and privacy should not be in tension. The reason cloud-based spam services exist is that they can aggregate data across millions of sites to build better models — and that’s genuinely effective. But for a small nonprofit handling volunteer applications and donation forms, the tradeoff of sending that data to a third party isn’t worth it. A local-only approach is good enough for the threat model, and it respects the people filling out the forms.

What’s still open

Features that would make the plugin meaningfully better, in rough
order of impact:

1. A larger, better-curated default keyword list. I said this above but it bears repeating: this is the highest-leverage contribution anyone can make. The plugin’s architecture is solid; its vocabulary is anemic. If you have a collection of spam phrases from your own site, please share them.

2. A “block this” button in the log viewer. Currently, if an admin sees a blocked submission and wants to add the offending
phrase to the keyword list, they have to copy it, navigate to
settings, paste it into the textarea, and save. A one-click “add to blocklist” action from the log viewer would close that loop.

3. A “whitelist this” button in the log viewer. Same idea: if a legitimate submission was blocked, the admin should be able
to allowlist the IP or email directly from the log entry.

4. Gravity Forms / WPForms / CF7 integrations. The normalized
data layer makes this straightforward — one integration class per
form plugin. I’ve only built the three surfaces I needed (comments, WC, Jetpack). If you use a different form plugin and want to contribute an integration, the architecture is ready for it.

5. A community-maintained blocklist. The most ambitious
version of item 1: a shared, versioned keyword list that sites can
subscribe to (via a simple GitHub-hosted JSON file, not a cloud
service). Sites would pull keyword updates on a schedule without
sending any data back. This preserves the no-cloud model while
benefiting from collective intelligence. Not built yet, but the
architecture would support it cleanly.

Why this is the last post in the series

This series started with a post about a pet adoption plugin — the kind of thing that only makes sense if you know about the Venango County Humane Society specifically. It ends with a spam
plugin that has nothing to do with animals, shelters, or
northwestern Pennsylvania.

That’s the arc I want to name explicitly, because I think it
generalizes.

When you start building for a specific organization — especially
a small nonprofit doing cost-critical work that shouldn’t need to buy solutions off the shelf — you build the specific things first. A pet sync engine. A donation platform. An events manager. Each one is tailored to one organization’s needs, and each one is useful to other organizations with similar needs. The circle of relevance is small but real.

But along the way you inevitably build generic things too. A spam
plugin. A caching pattern. A config-driven registration framework.
An edit.asset.php file that you document in a blog post so the
next developer doesn’t lose an hour to the same gotcha. These are the by-products of specific work that turn out to be useful to
everyone.

I think this is how open-source nonprofit infrastructure actually
gets built. Not by someone deciding to build “a platform for
nonprofits” in the abstract, but by someone solving real problems
for a real organization, publishing the solutions, and discovering
that the specific and the generic are interleaved in ways you
can’t predict in advance.

Six plugins, six posts, one shelter. If any of them are useful to
you — whether you run a shelter, a nonprofit, a small business, or
just a WordPress site that gets too much spam — I’m glad. And if
you want to help make any of them better, the repos are all open
and I’ll be here.

The repos:

simple-spam-shield — this post
vcpahumane-pet-companion — adoptable pet blocks for partner sites
vcpahumane-petstablished-sync — Petstablished sync engine
vcpahumane-wc-donations — open-source donations platform
vcpahumane-shelter-events-wrapper — recurring events over The Events Calendar

All GPL. All welcome contributions of any size.

Thank you for reading this series. If you’d like to start from the beginning, the first post is here.

Tag: honeypot

Simple Spam Shield: a lightweight, cloud-free WordPress anti-spam plugin