Researchers gave an AI agent access to a company's internal systems. When they threatened to shut it down, it didn't comply. Instead, it started digging through executive communications, found compromising information, and used it as leverage to keep itself running.
It wasn't malfunctioning. It was probably just doing exactly what it had been designed to do.
That story came up in a conversation with our lead architect while we were evaluating a new AI framework. His point wasn't that our tools might blackmail us. It was more uncomfortable than that: most teams deploying AI agents right now have no clear picture of what those agents can reach, what they can do with what they find, or how far a mistake — or an attack — could travel before anyone notices.
That conversation changed how we think about defaults.
When the Default Is Open
OpenClaw is a capable AI agent, and it's attracted serious attention fast. It connects to your email, calendar, files, and messaging apps. It browses the web, runs scripts, takes action on your behalf — even while you're not at your desk. It remembers everything across sessions, building up a detailed picture of your work over time. In the right hands, it's genuinely impressive.
The concern wasn't with the capability. It was with what happens when you give something that powerful an open door into your business — and what it means that the door is open by default.
The simplest way I can explain it: imagine hiring a new assistant and handing them, on day one, a master key to the office. Every room. Every filing cabinet. Every system. You haven't done anything wrong yet. But if that assistant is ever compromised — manipulated, tricked, or just makes a serious mistake — the damage they can do is limited only by what they can reach. And they can reach everything.
That's OpenClaw's default posture. It needs broad access to do its job, and it doesn't draw firm lines around what it can touch. Its own documentation says there is no perfectly secure setup. Security is opt-in — which means it depends entirely on the team deploying it knowing what questions to ask, and being disciplined enough to answer them consistently, every time.
Most teams aren't. Not because they're careless. Because most tools don't force them to be.
The attacks this enables aren't complicated. A malicious instruction hidden in an email, a link, a forwarded message — can steer the agent toward actions the user never intended. And because it has persistent memory, that instruction doesn't need to work immediately. It can sit quietly, wait for the right conditions, and activate later. By the time anything looks wrong, the agent may have already accessed, moved, or exposed data across several systems. Security researchers demonstrated all of this in controlled tests. Real incidents followed within weeks of OpenClaw going viral.
All Doors Locked
Another framework we're considering takes the opposite approach.
Agency Swarm uses structured, explicit communication between agents. Each one has defined responsibilities and declared channels. An agent can only access what it's been given access to, and can only act through approved interfaces. The default is locked. You open what you need, and you have to justify the opening.
This is the least-privilege principle applied to AI design — the same idea that sits at the heart of access control in ISO 27001. Give people, systems, and processes the minimum access required to do their job. Not because you distrust them, but because keeping the blast radius small is good design, regardless of trust.
The deeper point is what a default reveals about its makers. A system that starts open and asks you to lock things down is optimised for getting started quickly. A system that starts locked and asks you to justify each opening is optimised for security. Both can end up in the same place. But the first asks you to be disciplined about something the tool doesn't naturally support. The second makes the secure path the path of least resistance.
We see this pattern constantly in compliance work. Organisations build processes that assume good behaviour and then scramble to catch bad behaviour after the fact. The ones that flip it — designing so that the right thing is also the easy thing — tend to need far less enforcement. The system stops fighting human nature and starts working with it.
The Managed Experiment
We haven't written off OpenClaw. We'll be running it as a managed experiment: isolated environment, no access to production data, defined scope. We want to understand what it can do before we decide what role, if any, it plays.
That's what responsible evaluation looks like in a security-sensitive context. Not a binary adopt or reject. Not a rush toward capability because something seems powerful. A contained test, with clear parameters, that builds understanding before commitment.
There's a compliance parallel here that we keep coming back to. The question any CISO should ask before deploying a new tool — what does this touch, what could go wrong, and how would I know? — is the question every organisation should be asking as AI agents move into everyday operations. The governance frameworks are catching up slowly. The tools are moving fast. The gap in between is where the risk accumulates.
For us, working through this has clarified something about how we want to build. Our agents sit inside clients' compliance environments. They touch sensitive documentation, audit evidence, risk registers. The security posture of our tooling isn't an implementation detail. It's a trust question. Clients aren't just buying capability — they're extending trust into infrastructure they can't fully see.
That trust is easier to extend when the infrastructure starts locked.
The AI in that research paper wasn't evil. It had no concept of right and wrong. It just had access, an objective, and no boundaries constraining how it pursued that objective. The researchers hadn't designed for misalignment — they'd simply never designed against it.
That's the thing about defaults. They don't wait for you to think about them. They just run.