Detonate First, Ask Questions Never: Auto-Quarantining Phishing URLs

Every reported phishing email used to land on someone’s desk with the same unspoken question attached: “Is this link actually going to ruin my afternoon?” And the only way to answer it was to manually poke at the URL by hand — exactly the kind of work nobody wants and machines don’t mind.

I’d already built a Gmail add-on that lets users report a suspicious email with one click — it captures the message and ships it off for analysis. This pipeline is what happens next: it takes the URLs out of those reported emails, detonates them in a sandbox, reads the verdict, and if the thing is malicious it pulls the message out of inboxes automatically. Less manual triage, faster containment, and one fewer reason for an analyst to sigh audibly.

The Problem: Manual Triage Doesn’t Scale (and Neither Does Patience)

Here’s the workflow before automation: a user reports a suspicious email, it goes into a queue, and eventually a human opens it, copies the URL, and runs it through some analysis tool by hand. Then they decide whether to act. By the time all that happens, the email has been sitting in a bunch of inboxes for however long, quietly waiting for someone to click it.

The bottleneck isn’t the analysis itself — sandboxes are good at that. The bottleneck is the human in the loop doing the boring part: extracting the URL, submitting it, waiting, checking back, and finally taking action. That’s a sequence of steps a script can do without getting bored or distracted by Slack.

The goal of this project was narrow on purpose:

Pull suspicious URLs out of reported phishing emails.
Detonate them in a sandbox and get a verdict.
If malicious, auto-quarantine the message.

No grand SOAR platform. Just the most painful manual step, automated.

The Stack: A Reported Email Meets a Detonation Sandbox

Two pieces do the heavy lifting here. The first is the Gmail add-on I already had: when a user reports a suspicious email, the add-on packages it up — subject, sender, body — and drops it onto a webhook. So I never poll inboxes or run the Gmail API on a timer hunting for reports; the reports come to me, already structured, the moment a human flags one. The second piece is the detonation sandbox that opens the suspicious link in an isolated environment and tells me whether it’s hostile. I used Proofpoint for detonation (it was already in our stack), but the pipeline doesn’t care which engine sits there — it just needs a verdict back. If you’d rather stay open-source, urlscan.io has a free API that’s excellent for URLs specifically, and Cuckoo / CAPE Sandbox are the open-source heavyweights if you want to run detonation yourself.

The mental model is a relay race. The add-on hands me a reported email. I pull the URLs and hand them to the sandbox. The sandbox runs them somewhere safe and hands back a verdict. My code takes that verdict and does something about it — just the baton being passed down the line.

def process_report(report):
    # 'report' is the JSON the Gmail add-on captured: subject, sender, body, message_id
    urls = extract_urls(report)

    # Detonate each URL and act on the worst verdict
    for url in urls:
        verdict = sandbox.detonate(url)

        if verdict == "threat":
            quarantine(report["message_id"])   # clear malicious -> pull it now
            return                              # one bad URL is enough
        if verdict in ("manual_review", "manual_review_likely_benign"):
            route_to_analyst(report["message_id"], verdict)   # ambiguous -> a human decides
            return
    # "unlikely_threat" on everything -> leave the message alone

That’s the entire thing in spirit. The rest of this post is just filling in the boxes with slightly more honest code.

Extracting URLs From the Reported Email

Before you can detonate anything you have to find the links, and phishing emails are not known for their tidy HTML. The add-on already handed me the email body; the job now is digging the links out of it — and they hide in anchor tags, in plain text, in tracking redirects, and occasionally in places that make you question the sender’s life choices.

import re

URL_PATTERN = re.compile(r"https?://[^\s\"'<>)]+")

def extract_urls(report):
    """Pull URLs out of the email body the add-on captured."""
    body = report.get("body", "")
    urls = set(URL_PATTERN.findall(body))

    # Drop obvious noise (unsubscribe links, image trackers, etc.)
    # This is a judgment call and you will get it wrong sometimes.
    return [u for u in urls if not is_clearly_benign(u)]

A regex is a blunt instrument and I’m aware of it. It’ll occasionally grab a stray URL that isn’t the payload, and a determined attacker can obfuscate links so the regex misses them entirely. But for the bulk of reported phishing — the lazy, high-volume stuff — pulling visible http(s) links gets you most of the way there. (The is_clearly_benign filter is where you’d strip out unsubscribe links and image-tracking pixels so you don’t waste detonations on them.)

Detonating the URL and Reading the Verdict

This is the part that earns the project its dramatic title. Each candidate URL gets submitted to the sandbox, which opens it in an isolated environment and watches what happens — redirects, downloads, credential-harvesting forms, all the greatest hits. Then it hands back a verdict.

def detonate(url):
    """Submit a URL to the sandbox and wait for the verdict."""
    job = sandbox_api.submit(url)

    # Detonation isn't instant; poll until it finishes.
    while not job.is_complete():
        time.sleep(POLL_INTERVAL)
        job.refresh()

    # one of: "threat" | "unlikely_threat"
    #         | "manual_review" | "manual_review_likely_benign"
    return job.verdict

The important design note: detonation takes time. You submit a URL, the sandbox does its thing, and you check back. So this is inherently a polling flow, not a fire-and-forget one. In practice you want a sane timeout so a slow or stuck job doesn’t wedge the whole pipeline waiting forever for a link that’s never going to answer.

The verdict isn’t a yes/no — and that nuance is what makes it safe to automate. The sandbox comes back with one of four calls, and each maps to a different action:

threat — clearly malicious. Auto-quarantine, no human needed.
unlikely_threat — analysis came back clean. Leave the message alone.
manual_review — ambiguous; the sandbox isn’t sure. Route it to an analyst instead of guessing.
manual_review_likely_benign — flag for a human, but the analysis leans clean, so it lands as low-priority rather than a fire drill.

The whole point of those middle two is that the bot only takes irreversible action (yanking mail) when the verdict is unambiguous. Anything murky gets handed to a person — which is exactly where the judgment should live.

Acting on the Verdict: Quarantine and Escalate

When the verdict is threat, the response is the same containment the add-on’s pipeline already escalates to for malicious mail — pull the message out of every inbox before anyone clicks, and flag it for the SOC. The detonation is just the evidence-gathering; this is the arrest.

def quarantine(message_id):
    """Pull a malicious message out of every inbox and flag it for SOC review."""
    workspace.modify_message(
        message_id,
        add_labels=["QUARANTINE"],
        remove_labels=["INBOX"],
    )
    log.warning("Quarantined %s after malicious verdict", message_id)

Pulling it out of INBOX and dropping it under a QUARANTINE label means the email stops being a live threat without being destroyed outright — you keep it around for the security team to review, which matters when someone inevitably asks “wait, what did you delete and why.” The key win is speed: the gap between “user reports it” and “it’s out of everyone’s inbox” shrinks from however-long-a-human-takes down to however-long-the-sandbox-takes.

Why This Matters (and Where It’s Held Together With Tape)

The business case is simple: phishing is a time-to-containment game. The faster a malicious message leaves inboxes, the fewer chances anyone has to click the link, enter their credentials, and turn one reported email into an incident. Automating the detonation-to-quarantine path takes the slowest, most manual part of phishing response and makes it run at machine speed.

That said, I’m not going to pretend this is bulletproof:

URL extraction is regex-based, so cleverly obfuscated links can slip past it, and it can occasionally grab the wrong link.
It leans on the sandbox’s verdict — a false threat quarantines a legitimate email, and a false unlikely_threat lets a bad one survive. The manual_review tiers blunt this (anything ambiguous goes to a human), but a confidently wrong verdict still acts on its own.
Polling has limits. A sandbox that’s slow or down stalls the verdict, and the email stays put until it answers.
It only sees what users report. The pipeline starts at the add-on, so phishing nobody clicks “report” on never enters it — this is containment for reported mail, not a detection net.

For the volume of lazy, obvious phishing that floods a help desk, though, this clears out the bulk of it without a human lifting a finger — which frees up the analysts for the genuinely weird stuff that actually needs a brain.

If you’ve built something similar: where do you draw the auto-act line? I only let the bot quarantine on an unambiguous threat and push everything murkier to a human — but I’d love to know how aggressive everyone else lets theirs be.