No, you can’t vibecode an AI-driven threat hunting pipeline

Push uses commercial AI models to deliver agentic threat hunting. Can’t you just build something yourself with those same models? Well, no.

What would it take to vibecode your own AI-driven threat hunting pipeline?

The commercial models are right there. You’ve probably got a spare weekend coming up, a really nice espresso machine, and a few bucks for tokens. (Is there already an HGTV series on this?)

We recently published a detailed look at how we use AI agents as a force multiplier for Push’s threat hunting and detection engineering capabilities.One intriguing detail you might have noticed in that article is that at Push, we treat commercial AI models as commoditized infrastructure, akin to cloud computing.

So it’s a cheeky question, but a fair one, because if Push is using commercial models, what exactly are you paying for?

It turns out that the models are the easiest things to replace, and in fact we swap out different models with little impact on detection performance. What’s much harder to build is the expertise: the technical knowledge of various attack techniques, the instrumentation in the browser that produces the structured telemetry, and the enforcement layer that turns detections into real-time protection.

Let’s break it down.

The promise and the perils of threat hunting (and where agentic capabilities fit in)

Threat hunting — the practice of proactively searching for threats that haven’t been seen before — is one of the most effective practices in security and one of the least accessible.

The SANS 2025 Threat Hunting Survey found that 61% of organizations cite staffing shortages as the top barrier to running a hunting program. A single manual hunt takes 10 to 20 hours of sustained analyst focus — forming hypotheses about what an attacker might be doing, querying data sources sequentially, correlating results by hand, documenting findings. Many organizations hunt infrequently or not at all.

Threat hunting in the browser poses specific challenges: The stakes are high as AI-enabled attacks accelerate, and the availability of training and knowledge is low.

AiTM phishing kits that manipulate DOM elements in real time, ClickFix variants that inject malicious payloads through clipboard manipulation, ConsentFix attacks that abuse OAuth consent flows, credential harvesting on pages that rotate infrastructure hourly — these techniques don't map cleanly onto the endpoint-focused threat models most SOC teams were built around, or the data sources they’re used to interrogating.

Even well-staffed security organizations tend to have a blind spot in the browser layer because the expertise required to hunt there is specialized and the telemetry to support it hasn't historically been available.

Using AI agents to hunt for browser-based threats promises a net-new capability for smaller teams without dedicated threat hunting staff. For larger enterprises, the value of an agentic threat hunting capability lies in its ability to provide (or augment) expertise on emerging attack methods.

Most SOC teams have deep expertise at the endpoint, IdP, cloud, and network layers, built over years of working with those systems’ telemetry and workflows. But browser-based attacks operate in a different domain with different telemetry, different TTPs, and a different evasion model.

A capability like Push’s provides an answer to these three hurdles: providing expertise, without any additional burden on staff, and at a speed that matches the acceleration we’re currently witnessing in browser-based attack techniques.

This isn’t chatbot log analysis

When you hear “AI-powered threat hunting,” you might imagine an AI copilot sitting on top of your SIEM, summarizing alerts and correlating log entries faster than a human analyst could. It’s a fair assumption because many products use this kind of implementation, and tools like those are useful.

That’s not what we built at Push.

If you’re not familiar with Push, it’s a browser security platform deployed as an extension that detects and stops advanced browser-based attacks while also providing visibility and control over shadow apps and identities, including AI usage. You can use the same telemetry Push provides for these use cases to perform data loss and insider risk investigations, too.

What we built is an agentic threat hunting and detection pipeline where AI agents collaborate with in-house threat researchers to continuously hunt for emerging browser-based attack techniques across our customer base, and then automatically write and deploy new detections.

Our pipeline differs from AI-enabled log analysis in three key ways:

A new telemetry source is the foundation

First, the Push platform generates its own telemetry. The Push browser extension operates as a flight recorder, locally collecting browser session metadata that doesn’t exist anywhere else in the security stack — details like DOM structure, script execution contexts, redirect chains, credential entry behavior, OAuth consent flows, and network requests observed from inside the session.

This metadata is stored locally and only queried during targeted threat hunts, preserving user and customer privacy.

Proactive hunting, not just reactive triage

The pipeline also hunts proactively rather than triaging reactively, as with log analysis agents.

Push agents generate hypotheses, craft queries against the telemetry corpus, run them across millions of browsers, and triage the results — searching for techniques that haven't triggered any existing alert or rule.

The InstallFix discovery described in the original agentic threat hunting article is the clearest example: The Push pipeline surfaced 12 meaningful results from trillions of browser events, and one of them was a novel attack technique. That's threat hunting at machine scale, not just alert triage.

Not just analysis, but new detections, too

Finally, the output isn’t (only) a natural-language summary of what the agents found. It’s a production detection rule that ships to every Push customer and wires into real-time enforcement controls defined by Push admins.

The pipeline's job isn’t to help you understand an alert faster. Rather, it’s producing detection rules that didn't exist before at a speed that enables those detections to address emerging attack techniques and organization-specific campaigns within minutes.

Agentic threat hunting as core product infrastructure

The nice thing about commercially available AI models is that they’re really good at understanding web code. That arcane Javascript function you’d have to look up in the docs? They recognize it immediately. That makes them perfectly suited to provide domain knowledge that can be harnessed with the right security expertise.

Using commercial models in our agentic detection pipeline then becomes a force multiplier for our research team’s understanding of TTPs — not a security engine in and of itself.

The four core components of our agentic pipeline can’t be replaced by using the same models we do, because the value is not in the models, but in the product infrastructure, product telemetry, and research expertise those models capitalize on.

High-level view of our agentic threat hunting pipeline.

Component 1: The flight recorder

We deploy as a browser extension — not a separate browser, a proxy or an endpoint agent — which means we sit inside the browser session itself, seeing what the user sees.

A component of the extension acts as a flight recorder, collecting and locally storing browser-level metadata: DOM elements, tab context, script execution, network traffic, user actions, credential entry, and more. This body of structured browser event metadata is the searchable landscape for every hunt.

That's a data source most security teams have never had access to. You can't get it from an endpoint agent, a network proxy, or a cloud access log, because it doesn't exist outside the browser session. Turns out, it matters more than the model itself: When the model has this full browser context — the DOM, redirect chains, user behavior — it can reason about what happened. When it has to start guessing at those details, it starts hallucinating.

The Push extension, the telemetry it collects, and the scale (3 million browsers and counting) at which it operates is the underlying capability that makes our agentic threat hunting possible.

Component 2: The internal knowledge base

As we mentioned earlier, commercial LLMs understand web code exceedingly well. What they don’t know is which patterns in that code indicate a credential-harvesting AiTM kit versus a legitimate login page, or which redirect behavior signals an InstallFix lure versus a normal marketing funnel.

That distinction comes from our internal knowledge base — years of TTP analysis, curated libraries of traces from real phishing kits encountered in the wild, and hunt parameters refined through hundreds of investigations led by our experienced human research team.

This knowledge base also reflects a deliberate architectural choice.

As our CPO Jacques Louw put it on Risky Business: "There's no list of bad domains anywhere in the product. It's a crutch — a false cheat code that stops you from doing the detection in the way that actually is resilient, because the next time you see it, it will be on a different domain."

Our knowledge base encodes behavioral patterns and TTP signatures instead, which means detections remain effective even as infrastructure rotates underneath them.

We've also learned that even high-quality security data isn’t AI-ready out of the box. Structuring data and knowledge for agent consumption requires dedicated engineering.

Our researchers have spent that time identifying, naming, and documenting browser-based attack techniques and encoding that knowledge into a format that agents can operationalize and extend.

Component 3: The thoughtfully organized agents

The engineering challenge isn't getting a model to analyze one browser event — it's keeping it reliable across thousands of events. If you fill a context window with too much data and the model loses the ability to discern signal from noise, you get something called context rot. That's been our primary engineering focus over the last quarter: not making agents objectively smarter, but keeping them focused to improve their outputs.

Our solution is hierarchy. A hunting agent oversees the overall hunt — it understands the query and knows what it's looking for. It dispatches an army of analysis agents, each picking up a single result trace, the term we use for a series of events in a session or tab context.

But even a single trace can contain thousands of events, so each analysis agent breaks it down into blocks, analyzes and summarizes each one, looks for connections between them, and then bubbles up only the interesting signal. Layer by layer, the context narrows until what reaches the top is workable.

Different agents handle hypothesis generation, query crafting, triage, deep investigation, detection authoring, and meta-analysis for quality control. We back-test detections against real data before they ship. This segmentation and hierarchy took significant trial and error — you can swap out almost any individual model in the chain, but the hierarchy itself is the thing that ultimately makes it work.

The consensus coming out of RSAC this year reinforces this approach. The industry's focus has shifted from “which model is the best?” to “how do we build reliable systems around these models?”

What matters is what goes into the model — the telemetry, the domain knowledge, the structured context — and what happens around it: the orchestration, quality control, and feedback loops. That's where production reliability comes from, and it's where you’ll find the need for significant engineering effort.

Component 4: The response engine

Finally, a hunt without a response you can operationalize is just a report. When our agents identify a new technique, the detection they write feeds directly into the same platform that enforces real-time controls in the browser: blocking credential entry on phishing pages, intercepting clipboard injection attacks, warning users during suspicious OAuth consent flows, etc.

Detection and response share the same infrastructure, which means a new technique can go seamlessly from hunt analysis to production enforcement.

Without a browser-layer enforcement tool, any knowledge of emerging attack methods can only be addressed by revoking sessions, resetting passwords, adding (short-lived) domains to a blocklist, wiping compromised machines, and other after-the-fact response actions that keep security teams on the back foot against these kinds of incidents.

Learn more about Push and how we develop new detections

For a deeper look at how the pipeline works in practice, including a step-by-step walkthrough of how we discovered a novel InstallFix attack targeting NotebookLM users, the two-loop detection architecture that creates a compounding effect for customers, and the emerging best practices we've identified for using AI agents in security operations, check out our companion article: Can AI replace a threat researcher? What we learned building an agentic threat hunting pipeline at Push.

If you'd like to see how our agentic detection capabilities apply to your environment, book a demo.

About the author

Kelly Davenport

Product Team

No, you can’t just vibecode an AI-driven threat hunting pipeline