This is the last post in a series about plugins I’ve built for the Venango County Humane Society. The previous five posts covered shelter-specific WordPress work — adoptable pet listings, a Petstablished sync engine, a donations platform, a recurring events manager. All of them are specialized for nonprofit use cases; none of them would make sense outside of that context.
This post is about the one plugin in the stack that has nothing to do with animals, donations, or shelters. It’s a spam plugin. It’s called Simple Spam Shield, and the only connection to the shelter is that it was born from a shelter problem.
Here’s the problem. Within weeks of being posted, the shelter’s volunteer application form — built using Jetpack’s Contact Form blocks — started receiving thousands of fake submissions. Casino spam, pharma spam, SEO link dumps, the usual. So many that I had to wade through hundreds of pages of garbage to find real volunteer applications. We were using WordPress blacklisted terms at the time.
The obvious answer was Akismet. It’s the default WordPress spam solution, it’s free for personal sites, and it works well. But it sends every form submission to Akismet’s cloud service for analysis, which means every volunteer applicant’s name, email address, and personal information passes through a third-party server. For a nonprofit that handles sensitive community data, that felt wrong. The alternative — installing a commercial anti-spam plugin with a $50/year license — felt like the same paywall dynamic I’d been building around in the other plugins.
So I built a spam shield. No cloud. No API keys. No external dependencies. Everything runs locally on the WordPress server. It protects comments, WooCommerce product reviews, and Jetpack Contact Forms using a small bundle of proven mechanisms that don’t require phoning home. It’s been running on vcpahumane.org for a few weeks as the site’s only spam protection.
The repo is at github.com/jwincek/simple-spam-shield. GPL, zero-config out of the box, designed to be useful to any WordPress site that wants spam protection without a cloud dependency.
What “simple” means
The name is deliberate. “Simple” doesn’t mean the code is a single file — it’s a few thousand lines of PHP with config-driven guard definitions, a normalized data pipeline, and per-surface integration classes. “Simple” means the feature selection is small and focused: seven spam-detection mechanisms, three protected surfaces, one settings page, one log viewer. No machine learning, no remote API, no cloud dashboard, no premium tier.
The thesis is that a handful of well-implemented local checks — honeypot fields, time gates, duplicate detection, nonce validation, link counting, keyword matching, and optional behavioral scoring — catch the vast majority of automated spam without needing to send user data off-server. They don’t catch everything. A determined human spammer will get through. But for the kind of automated form-crawling that floods small sites, they’re effective and they preserve privacy.
The guard pipeline
The plugin’s core is a weighted, short-circuit pipeline called the Guard_Runner. Seven guards are defined in config/guards.json, each with a weight that determines execution order. The runner sorts guards by weight (highest first), initializes them, and runs them sequentially against the submission data. The first guard that fails blocks the submission — no further guards run.
| Guard | Weight | Default | What it does |
|---|---|---|---|
| Honeypot | 100 | On | Hidden form field that bots fill in, humans don’t |
| Duplicate | 95 | On | MD5 hash of content + author + email + IP, checked against a 60-second transient window |
| Time Gate | 90 | On | Rejects submissions faster than 3 seconds after page load |
| Nonce | 80 | On | Standard WordPress nonce verification |
| Link Limit | 70 | On | Rejects submissions with more than 3 URLs |
| Keyword Block | 60 | On | Case-insensitive matching against a blocklist |
| Behavioral | 55 | Off | Scores mouse movements, clicks, and time on page |
The weight ordering is intentional: cheap checks run first. The honeypot is a single empty-field check — essentially free. The duplicate guard is a transient lookup. The time gate is arithmetic. By the time the runner reaches keyword matching (which involves string operations against a list), most bot submissions have already been caught and rejected by a faster check.
All seven guards implement a shared Guard_Interface and extend Abstract_Guard, so adding an eighth guard is a matter of writing one class and adding an entry to guards.json. The pipeline doesn’t need to know about the new guard’s internals — just its weight and whether it’s enabled.
Three surfaces, one normalized layer
The plugin protects three different form systems, each with its own submission lifecycle:
- WordPress comments — intercepted via the
preprocess_commentfilter at priority 1 - WooCommerce product reviews — intercepted via
woocommerce_new_comment(reviews are stored as comments but go through WC’s own flow) - Jetpack Contact Forms — intercepted via
jetpack_contact_form_is_spamfilter
Each surface has a thin integration class in includes/integrations/ whose only job is to normalize the submission data into a common format — content, author, email, plus any JS-injected fields (honeypot value, nonce, timestamp) — and pass it to the Guard_Runner. The guards never need to know which surface the data came from.
This is the same pattern as the Petstablished sync plugin’s abilities layer: a normalized interface between the outside world and the core logic, so the core logic stays clean and testable regardless of how many input surfaces you add. If someone wanted to add protection for Gravity Forms or WPForms, they’d write one integration class, normalize the data, and the existing seven guards would apply automatically.
The Jetpack problem (and the two-phase workaround)
Jetpack Contact Forms are the hardest surface to protect, and the reason is worth documenting because other plugin developers will run into the same wall.
When a Jetpack form is submitted, Jetpack’s processor recognizes only the fields defined in the form’s configuration. Any extra fields that the plugin injects — the honeypot field, the nonce, the timestamp, the behavioral data — are silently stripped from $_POST before Jetpack’s spam filter fires. By the time the plugin’s jetpack_contact_form_is_spam filter runs, the JS-injected guard data is gone.
The solution is a two-phase pipeline:
Phase 1 runs before Jetpack processes the form. At this point, raw $_POST still contains the injected fields, so JS-dependent guards (honeypot, nonce, time gate, behavioral) can check their data and set a rejection flag.
Phase 2 runs during Jetpack’s own spam filter. If Phase 1 already flagged the submission, it returns immediately. Otherwise, it runs content-based guards (keyword block, link limit, duplicate) against Jetpack’s structured form data — which is available at this point even though the JS fields aren’t.
The guards themselves handle the edge case gracefully: if a guard expects a field that’s missing (because Jetpack stripped it), it skips rather than hard-failing. This means even if Phase 1 doesn’t fire for some reason, the content-based guards still protect the form. Defense in depth, with graceful degradation.
This is the kind of integration problem that doesn’t show up in documentation or tutorials. Jetpack’s field-stripping behavior is undocumented and invisible until you try to inject custom fields into a form submission and watch them disappear. If you’re building any plugin that needs to intercept Jetpack form data, plan for this.
The keyword list problem
Let me be honest about the plugin’s current biggest weakness: the default keyword blocklist has seven entries.
‘casino, poker, viagra, cialis, crypto airdrop, free money, click here now’
Seven words. That’s the out-of-the-box protection against keyword-based spam. The other six guards (honeypot, time gate, nonce, duplicate, link limit, behavioral) don’t depend on keywords and work fine, but keyword blocking is the guard that catches content-aware spam — the submissions that are crafted to look human-like but contain telltale phrases. Seven keywords is not enough for that.
The reason it’s seven is that I’ve been cautious about false positives. Every keyword added to the default list is a keyword that could block a legitimate submission on someone’s site I’ve never seen. “Casino” is safe — no legitimate volunteer application mentions casinos. But “free” alone would block real content. “Buy” would block WooCommerce review discussions. The default list needs to be universally safe, which means it needs to be conservative, which means it’s small.
This is the single most useful contribution someone could make to this plugin right now: a curated, well-tested default keyword list that’s aggressive enough to catch common spam patterns but conservative enough to avoid false positives on typical WordPress sites. If you maintain a WordPress site and have access to your spam folder, the phrases in there are exactly what this list needs. Open an issue, paste the patterns, and I’ll merge them.
The admin can add site-specific keywords from the settings page (it’s a textarea, one keyword per line), but the defaults should be good enough that most sites don’t need to touch them.
Allowlisting and the privacy model
Before any guard runs, the pipeline checks the submitter’s IP and email against an allowlist. The allowlist supports exact IPs, CIDR ranges (e.g., 10.0.0.0/8), exact email addresses, and domain patterns (e.g., @trusted.org). Allowlisted submissions bypass all guards entirely.
The broader privacy model is simple: no data leaves the server. Form submissions are checked locally. Blocked attempts are logged to a custom database table on the site’s own database. The log captures the guard that triggered, the reason, a content excerpt, the IP, and the user agent — enough to diagnose false positives, not enough to build a surveillance profile.
The log can be disabled entirely from the settings page, and uninstall.php drops the log table and deletes all plugin options and transients. A clean uninstall leaves no orphaned data behind.
This is the part of the plugin I feel most strongly about. Spam protection and privacy should not be in tension. The reason cloud-based spam services exist is that they can aggregate data across millions of sites to build better models — and that’s genuinely effective. But for a small nonprofit handling volunteer applications and donation forms, the tradeoff of sending that data to a third party isn’t worth it. A local-only approach is good enough for the threat model, and it respects the people filling out the forms.
What’s still open
Features that would make the plugin meaningfully better, in rough order of impact:
1. A larger, better-curated default keyword list. I said this above but it bears repeating: this is the highest-leverage contribution anyone can make. The plugin’s architecture is solid; its vocabulary is anemic. If you have a collection of spam phrases from your own site, please share them.
2. A “block this” button in the log viewer. Currently, if an admin sees a blocked submission and wants to add the offending phrase to the keyword list, they have to copy it, navigate to settings, paste it into the textarea, and save. A one-click “add to blocklist” action from the log viewer would close that loop.
3. A “whitelist this” button in the log viewer. Same idea: if a legitimate submission was blocked, the admin should be able to allowlist the IP or email directly from the log entry.
4. Gravity Forms / WPForms / CF7 integrations. The normalized data layer makes this straightforward — one integration class per form plugin. I’ve only built the three surfaces I needed (comments, WC, Jetpack). If you use a different form plugin and want to contribute an integration, the architecture is ready for it.
5. A community-maintained blocklist. The most ambitious version of item 1: a shared, versioned keyword list that sites can subscribe to (via a simple GitHub-hosted JSON file, not a cloud service). Sites would pull keyword updates on a schedule without sending any data back. This preserves the no-cloud model while benefiting from collective intelligence. Not built yet, but the architecture would support it cleanly.
Why this is the last post in the series
This series started with a post about a pet adoption plugin — the kind of thing that only makes sense if you know about the Venango County Humane Society specifically. It ends with a spam plugin that has nothing to do with animals, shelters, or northwestern Pennsylvania.
That’s the arc I want to name explicitly, because I think it generalizes.
When you start building for a specific organization — especially a small nonprofit doing cost-critical work that shouldn’t need to buy solutions off the shelf — you build the specific things first. A pet sync engine. A donation platform. An events manager. Each one is tailored to one organization’s needs, and each one is useful to other organizations with similar needs. The circle of relevance is small but real.
But along the way you inevitably build generic things too. A spam plugin. A caching pattern. A config-driven registration framework. An edit.asset.php file that you document in a blog post so the next developer doesn’t lose an hour to the same gotcha. These are the by-products of specific work that turn out to be useful to
everyone.
I think this is how open-source nonprofit infrastructure actually gets built. Not by someone deciding to build “a platform for nonprofits” in the abstract, but by someone solving real problems for a real organization, publishing the solutions, and discovering that the specific and the generic are interleaved in ways you
can’t predict in advance.
Six plugins, six posts, one shelter. If any of them are useful to you — whether you run a shelter, a nonprofit, a small business, or just a WordPress site that gets too much spam — I’m glad. And if you want to help make any of them better, the repos are all open and I’ll be here.
The repos:
- simple-spam-shield — this post
- vcpahumane-pet-companion — adoptable pet blocks for partner sites
- vcpahumane-petstablished-sync — Petstablished sync engine
- vcpahumane-wc-donations — open-source donations platform
- vcpahumane-shelter-events-wrapper — recurring events over The Events Calendar
All GPL. All welcome contributions of any size.
Thank you for reading this series. If you’d like to start from the beginning, the first post is here.
Leave a Reply