Tag: honeypot

  • Simple Spam Shield: a lightweight, cloud-free WordPress anti-spam plugin


    This is the last post in a series about plugins I’ve built for the Venango County Humane Society. The previous five posts covered shelter-specific WordPress work — adoptable pet listings, a Petstablished sync engine, a donations platform, a recurring events manager. All of them are specialized for nonprofit use cases; none of them would make sense outside of that context.

    This post is about the one plugin in the stack that has nothing to do with animals, donations, or shelters. It’s a spam plugin. It’s called Simple Spam Shield, and the only connection to the shelter is that it was born from a shelter problem.

    Here’s the problem. Within weeks of being posted, the shelter’s volunteer application form — built using Jetpack’s Contact Form blocks — started receiving thousands of fake submissions. Casino spam, pharma spam, SEO link dumps, the usual. So many that I had to wade through hundreds of pages of garbage to find real volunteer applications. We were using WordPress blacklisted terms at the time.

    The obvious answer was Akismet. It’s the default WordPress spam
    solution, it’s free for personal sites, and it works well. But it sends every form submission to Akismet’s cloud service for analysis, which means every volunteer applicant’s name, email address, and personal information passes through a third-party server. For a nonprofit that handles sensitive community data, that felt wrong. The alternative — installing a commercial anti-spam plugin with a $50/year license — felt like the same paywall dynamic I’d been building around in the other plugins.

    So I built a spam shield. No cloud. No API keys. No external
    dependencies. Everything runs locally on the WordPress server. It
    protects comments, WooCommerce product reviews, and Jetpack Contact Forms using a small bundle of proven mechanisms that don’t require phoning home. It’s been running on vcpahumane.org for a few weeks as the site’s only spam protection.

    The repo is at github.com/jwincek/simple-spam-shield. GPL, zero-config out of the box, designed to be useful to any WordPress site that wants spam protection without a cloud dependency.

    What “simple” means

    The name is deliberate. “Simple” doesn’t mean the code is a single
    file — it’s a few thousand lines of PHP with config-driven guard
    definitions, a normalized data pipeline, and per-surface integration classes. “Simple” means the feature selection is small and focused: seven spam-detection mechanisms, three protected surfaces, one settings page, one log viewer. No machine learning, no remote API, no cloud dashboard, no premium tier.

    The thesis is that a handful of well-implemented local checks —
    honeypot fields, time gates, duplicate detection, nonce validation, link counting, keyword matching, and optional behavioral scoring — catch the vast majority of automated spam without needing to send user data off-server. They don’t catch everything. A determined human spammer will get through. But for the kind of automated form-crawling that floods small sites, they’re effective and they preserve privacy.

    The guard pipeline

    The plugin’s core is a weighted, short-circuit pipeline called the
    Guard_Runner. Seven guards are defined in config/guards.json, each with a weight that determines execution order. The runner sorts guards by weight (highest first), initializes them, and runs them sequentially against the submission data. The first guard that fails blocks the submission — no further guards run.

    GuardWeightDefaultWhat it does
    Honeypot100OnHidden form field that bots fill in, humans don’t
    Duplicate95OnMD5 hash of content + author + email + IP, checked against a 60-second transient window
    Time Gate90OnRejects submissions faster than 3 seconds after page load
    Nonce80OnStandard WordPress nonce verification
    Link Limit70OnRejects submissions with more than 3 URLs
    Keyword Block60OnCase-insensitive matching against a blocklist
    Behavioral55OffScores mouse movements, clicks, and time on page

    The weight ordering is intentional: cheap checks run first. The
    honeypot is a single empty-field check — essentially free. The
    duplicate guard is a transient lookup. The time gate is arithmetic.
    By the time the runner reaches keyword matching (which involves
    string operations against a list), most bot submissions have already been caught and rejected by a faster check.

    All seven guards implement a shared Guard_Interface and extend Abstract_Guard, so adding an eighth guard is a matter of writing one class and adding an entry to guards.json. The pipeline doesn’t need to know about the new guard’s internals — just its weight and whether it’s enabled.

    Three surfaces, one normalized layer

    The plugin protects three different form systems, each with its own submission lifecycle:

    • WordPress comments — intercepted via the preprocess_comment filter at priority 1
    • WooCommerce product reviews — intercepted via
      woocommerce_new_comment (reviews are stored as comments but go through WC’s own flow)
    • Jetpack Contact Forms — intercepted via
      jetpack_contact_form_is_spam filter

    Each surface has a thin integration class in includes/integrations/ whose only job is to normalize the submission data into a common format — content, author, email, plus any JS-injected fields (honeypot value, nonce, timestamp) — and pass it to the Guard_Runner. The guards never need to know which surface the data came from.

    This is the same pattern as the Petstablished sync plugin’s abilities layer: a normalized interface between the outside world and the core logic, so the core logic stays clean and testable regardless of how many input surfaces you add. If someone wanted to add protection for Gravity Forms or WPForms, they’d write one integration class, normalize the data, and the existing seven guards would apply automatically.

    The Jetpack problem (and the two-phase workaround)

    Jetpack Contact Forms are the hardest surface to protect, and the reason is worth documenting because other plugin developers will run into the same wall.

    When a Jetpack form is submitted, Jetpack’s processor recognizes only the fields defined in the form’s configuration. Any extra fields that the plugin injects — the honeypot field, the nonce, the timestamp, the behavioral data — are silently stripped
    from $_POST before Jetpack’s spam filter fires. By the time the
    plugin’s jetpack_contact_form_is_spam filter runs, the JS-injected guard data is gone.

    The solution is a two-phase pipeline:

    Phase 1 runs before Jetpack processes the form. At this point,
    raw $_POST still contains the injected fields, so JS-dependent
    guards (honeypot, nonce, time gate, behavioral) can check their
    data and set a rejection flag.

    Phase 2 runs during Jetpack’s own spam filter. If Phase 1 already flagged the submission, it returns immediately. Otherwise, it runs content-based guards (keyword block, link limit, duplicate) against Jetpack’s structured form data — which is available at this
    point even though the JS fields aren’t.

    The guards themselves handle the edge case gracefully: if a guard expects a field that’s missing (because Jetpack stripped it), it skips rather than hard-failing. This means even if Phase 1 doesn’t fire for some reason, the content-based guards still protect the form. Defense in depth, with graceful degradation.

    This is the kind of integration problem that doesn’t show up in
    documentation or tutorials. Jetpack’s field-stripping behavior is
    undocumented and invisible until you try to inject custom fields
    into a form submission and watch them disappear. If you’re building any plugin that needs to intercept Jetpack form data, plan for this.

    The keyword list problem

    Let me be honest about the plugin’s current biggest weakness: the default keyword blocklist has seven entries.

    ‘casino, poker, viagra, cialis, crypto airdrop, free money, click here now’

    Seven words. That’s the out-of-the-box protection against
    keyword-based spam. The other six guards (honeypot, time gate,
    nonce, duplicate, link limit, behavioral) don’t depend on keywords
    and work fine, but keyword blocking is the guard that catches
    content-aware spam — the submissions that are crafted to look
    human-like but contain telltale phrases. Seven keywords is not
    enough for that.

    The reason it’s seven is that I’ve been cautious about false
    positives. Every keyword added to the default list is a keyword
    that could block a legitimate submission on someone’s site I’ve
    never seen. “Casino” is safe — no legitimate volunteer application
    mentions casinos. But “free” alone would block real content. “Buy”
    would block WooCommerce review discussions. The default list needs to be universally safe, which means it needs to be conservative, which means it’s small.

    This is the single most useful contribution someone could make to this plugin right now: a curated, well-tested default keyword
    list that’s aggressive enough to catch common spam patterns but
    conservative enough to avoid false positives on typical WordPress
    sites. If you maintain a WordPress site and have access to your
    spam folder, the phrases in there are exactly what this list needs.
    Open an issue, paste the patterns, and I’ll merge them.

    The admin can add site-specific keywords from the settings page
    (it’s a textarea, one keyword per line), but the defaults should
    be good enough that most sites don’t need to touch them.

    Allowlisting and the privacy model

    Before any guard runs, the pipeline checks the submitter’s IP
    and email against an allowlist. The allowlist supports exact IPs,
    CIDR ranges (e.g., 10.0.0.0/8), exact email addresses, and
    domain patterns (e.g., @trusted.org). Allowlisted submissions
    bypass all guards entirely.

    The broader privacy model is simple: no data leaves the server.
    Form submissions are checked locally. Blocked attempts are logged to a custom database table on the site’s own database. The log captures the guard that triggered, the reason, a content excerpt, the IP, and the user agent — enough to diagnose false positives, not enough to build a surveillance profile.

    The log can be disabled entirely from the settings page, and
    uninstall.php drops the log table and deletes all plugin options and transients. A clean uninstall leaves no orphaned data behind.

    This is the part of the plugin I feel most strongly about. Spam
    protection and privacy should not be in tension. The reason cloud-based spam services exist is that they can aggregate data across millions of sites to build better models — and that’s genuinely effective. But for a small nonprofit handling volunteer applications and donation forms, the tradeoff of sending that data to a third party isn’t worth it. A local-only approach is good enough for the threat model, and it respects the people filling out the forms.

    What’s still open

    Features that would make the plugin meaningfully better, in rough
    order of impact:

    1. A larger, better-curated default keyword list. I said this above but it bears repeating: this is the highest-leverage contribution anyone can make. The plugin’s architecture is solid; its vocabulary is anemic. If you have a collection of spam phrases from your own site, please share them.

    2. A “block this” button in the log viewer. Currently, if an admin sees a blocked submission and wants to add the offending
    phrase to the keyword list, they have to copy it, navigate to
    settings, paste it into the textarea, and save. A one-click “add to blocklist” action from the log viewer would close that loop.

    3. A “whitelist this” button in the log viewer. Same idea: if a legitimate submission was blocked, the admin should be able
    to allowlist the IP or email directly from the log entry.

    4. Gravity Forms / WPForms / CF7 integrations. The normalized
    data layer makes this straightforward — one integration class per
    form plugin. I’ve only built the three surfaces I needed (comments, WC, Jetpack). If you use a different form plugin and want to contribute an integration, the architecture is ready for it.

    5. A community-maintained blocklist. The most ambitious
    version of item 1: a shared, versioned keyword list that sites can
    subscribe to (via a simple GitHub-hosted JSON file, not a cloud
    service). Sites would pull keyword updates on a schedule without
    sending any data back. This preserves the no-cloud model while
    benefiting from collective intelligence. Not built yet, but the
    architecture would support it cleanly.

    Why this is the last post in the series

    This series started with a post about a pet adoption plugin — the kind of thing that only makes sense if you know about the Venango County Humane Society specifically. It ends with a spam
    plugin that has nothing to do with animals, shelters, or
    northwestern Pennsylvania.

    That’s the arc I want to name explicitly, because I think it
    generalizes.

    When you start building for a specific organization — especially
    a small nonprofit doing cost-critical work that shouldn’t need to buy solutions off the shelf — you build the specific things first. A pet sync engine. A donation platform. An events manager. Each one is tailored to one organization’s needs, and each one is useful to other organizations with similar needs. The circle of relevance is small but real.

    But along the way you inevitably build generic things too. A spam
    plugin. A caching pattern. A config-driven registration framework.
    An edit.asset.php file that you document in a blog post so the
    next developer doesn’t lose an hour to the same gotcha. These are the by-products of specific work that turn out to be useful to
    everyone.

    I think this is how open-source nonprofit infrastructure actually
    gets built. Not by someone deciding to build “a platform for
    nonprofits” in the abstract, but by someone solving real problems
    for a real organization, publishing the solutions, and discovering
    that the specific and the generic are interleaved in ways you
    can’t predict in advance.

    Six plugins, six posts, one shelter. If any of them are useful to
    you — whether you run a shelter, a nonprofit, a small business, or
    just a WordPress site that gets too much spam — I’m glad. And if
    you want to help make any of them better, the repos are all open
    and I’ll be here.


    The repos:

    All GPL. All welcome contributions of any size.


    Thank you for reading this series. If you’d like to start from the beginning, the first post is here.