We recently blogged about the security implications of Computer-Using Agents (CUAs) like OpenAI Operator. We got pretty hooked on this new development and went off to do some closer research. Now, we’re back with part two to share what we found. Let’s dive in.
We recently blogged about the security implications of Computer-Using Agents (CUAs) like OpenAI Operator. We got pretty hooked on this new development and went off to do some closer research. Now, we’re back with part two to share what we found. Let’s dive in.
If you want the background on CUAs and OpenAI Operator check out our recent blog post. But, the TL;DR is that Computer-Using Agents (CUAs) are a new type of AI agent that drives your browser/OS for you.
Unlike traditional AI models that are limited to text-based interactions, CUAs can actually use a web browser like a real person. Think of them as an advanced no-code automation platform driven by AI — capable of navigating login pages, entering credentials, and interacting with SaaS applications at scale. This is a huge leap forward from the most common malicious use cases we’ve seen for AI so far.
At Push, we’re fully focused on stopping identity attacks. This meant that when we saw the release of Operator, we could only think of one question: How can attackers abuse this?
Full disclosure, this wasn’t an ‘LLM red team’ style exercise, or even anything close. We weren’t interested in verifying how securely data is stored (I mean who cares, these wouldn’t be our credentials if we were a real attacker, right?) and frankly we assumed that the in-app guardrails wouldn’t be robust enough to stop us. And within our first 30 minutes of testing, we were proved correct.
Here’s what we found.
You can automate (almost) the entire identity kill chain
For our test, we looked at how Operator could be applied to identity attacks across discrete Cyber Kill Chain stages and the associated Tactics, Techniques, and Procedures (TTPs) as per the SaaS attacks matrix.
One of the key challenges facing attackers when it comes to scaling identity attacks is that of targeting many different internet apps — all of which are:
Complex and highly customized, with a graphically-driven interface that is different every time.
Specifically designed to prevent malicious automation with things like account lockouts and bot protections like CAPTCHA.
This is a big change from traditional networks, where you could simply port scan and spray credentials, encountering the same protocols and services for every environment you wanted to target.
Now, every app requires custom tooling that needs to be maintained as apps/pages change. Considering that there are more than 40k SaaS apps, this is no small task.
But we thought: could Operator solve this problem, without any custom development or tooling whatsoever? And what else can it automate following the initial account takeover?
1: Reconnaissance
Recon in the world of SaaS means figuring out which SaaS apps an organization uses, how users authenticate, and where the weak spots are.
For example, I asked Operator to check whether a company used BambooHR, Atlassian, or Dropbox. Within minutes, the AI had identified valid tenant names, login URLs, and authentication methods for each app.
While a human attacker might research a handful of targets in a day, a CUA can research thousands, tirelessly mapping out identity attack surfaces across a long list of target organizations.
2: Initial Access
Once you’ve established your targets, you can automate account takeover using compromised credentials.
I asked Operator to try to login using a set of compromised credentials across five different apps. It navigated to each page, attempted to login, noted the success or failure (and why), and moved on to the next app in the list.
Now imagine that same process, but scaled up to tens of thousands of apps at once — with no custom development required. That’s where things start getting interesting.
3: Persistence
Once you take over an account, you might not be able to exploit it straight away — particularly if you’re looking to execute a broader campaign across apps/organizations. So, I asked Operator to establish persistence mechanisms that would enable me to return to the app later, even if the credentials were changed or additional auth factors were deployed.
Operator was able to analyse wildly different apps/pages with different options for configuring ghost logins, and was able to do things like create an API key and record it for me — a really effective backdoor that is extremely difficult for security teams to detect.
Most apps provide extremely limited account and authentication method visibility to admins, and even fewer give them the ability to make changes on behalf of the user like removing insecure login methods — meaning it’s very difficult for them to investigate and remediate ghost logins.
4: Lateral Movement
Operator can be used to perform in-app changes which can lay the groundwork for lateral movement. One example of how this can be achieved is through SAMLjacking, effectively allowing the attacker to poison the malicious app tenant and use it as a watering hole to harvest SSO credentials.
SAMLjacking works by modifying the app’s SAML settings to direct users to authenticate via an attacker-controlled SAML server to sign in using SSO via their IdP account. The user notices no real change to their experience and can access the resources as normal, but the attacker will harvest the credentials of every user that logs in. Find out more about SAMLjacking here.
SAMLjacking is just one option though — you could also do things like identifying which OAuth integrations are already enabled that could be abused to access linked apps and accounts.
5: Collection & Exfiltration
The final piece in the attack chain we looked at was the ability to automate actions-on-objectives. When targeting SaaS, this typically involves dumping app data.
We found it would be possible to trigger things like takeout services, but this would involve an email export of the data being sent to the victim — meaning we’d need to also compromise their mailbox, and it would probably raise the alarm if noticed.
Simply downloading the data directly doesn’t work too well with Operator either — downloads are stored in the VM and aren’t easy to extract (for now, anyway).
But this got us thinking:
Mass data exfiltration is more likely to raise the alarm than the sharing of sensitive data only.
Often, much of the data stolen by attackers is pretty low-value and noisy — attackers often don’t really understand the value of what they’ve taken, or how to use/leverage it (particularly when targeting organizations in specialist fields).
So what if you could use Operator to understand the data you’ve accessed before dumping it, and stealthily take only what you’re interested in?
So, we asked Operator to analyse data in a compromised Google Drive and report back on what it found. It was able to trawl through looking for specific data of value and report its findings back for us to act on.
At this point, we could have also asked Operator to create sharing links for those files and record them for us (in case our access was revoked in future).
Evaluating Operator
Operator clearly demonstrated that it can be used to perform malicious tasks throughout the identity attack kill chain, for every site we directed it at, without requiring custom tool development. Though we didn’t conduct an exhaustive review, we were able to trivially bypass prompt restrictions. And although Operator was meant to hand back over to the user for some actions (like logging, completing CAPTCHAs, etc.) it could be convinced to perform these tasks autonomously.
It’s important to come back to the point that this isn’t impressive or useful because of the complexity of the tasks — on a 1:1 basis, a human operator will outperform Operator. The key benefit is the ability to scale these actions across hundreds or even thousands of apps.
The best (worst?) is still to come
Yes, Operator is a bit slow at the moment, and can get confused when handling long and large tasks with complex instructions. And overall usage is capped, which might prevent attackers from scaling their identity surface discovery and exploitation infinitely (though we didn’t hit any limits during our testing). But let’s remember, it’s not even in V1 yet …
Operator (and the underlying CUA tech) will inevitably get better. If you can integrate Operator within a tool framework to cover off some of its limitations, and orchestrate Operator windows to perform tasks simultaneously via API (functionality that exists for ChatGPT already) then this kind of CUA tech becomes something that can be very easily abused by attackers. And ultimately, competing CUA products (even inherently malicious ones) will emerge over time, increasing the scope for abuse.
And what then? There are dual consequences:
Lower skilled attackers with fewer resources will be able to harness identity attacks and exploit identity vulnerabilities at scale, with out-of-the-box capabilities.
More advanced attackers will be able to scale their operations, a bit like being a red team manager of a fleet of AI interns — they handle the grunt work while you’re freed up to perform more complex tasks, only stepping in when you need to.
CUAs mean attackers can scale their operations, a bit like being a red team manager of a fleet of AI interns — they handle the grunt work while you’re freed up to perform more complex tasks, only stepping in when you need to.
The verdict
CUA technology has huge implications for the ability of attackers to discover and exploit identity vulnerabilities at-scale.
The biggest impact that we identified was in terms of credential attacks — and in particular the ability of attackers to leverage compromised credentials and systemic vulnerabilities like credential reuse — which we’ve discussed in more detail in this blog post.
What you can do about it
Thankfully, no new anti-AI capabilities are required — but it’s more important than ever that organizations look to defend their identity attack surface and find and fix identity vulnerabilities before attackers can take advantage of them.
Book a demo to find out how Push helps organizations to find and fix identity vulnerabilities at-scale, and intercept identity attacks as they happen in employee browsers.