There is a bug in OpenBSD that sat undetected for 27 years. It could remotely crash any server running the software. A few weeks ago, an AI found it. The same AI found a flaw in FFmpeg (the software that handles video across most of the internet) that five million automated tests had missed over sixteen years. It did not find these bugs because it was trained to be a hacker. It found them because it was trained to be exceptionally good at code. The hacking capability came free.
This is Anthropic's Mythos model. It is not yet public. And the reason it is not public is also the reason it matters to anyone thinking seriously about compliance.
When the Technical Layer Becomes AI-Native
The numbers are genuinely unsettling. On the standard industry benchmark for fixing real-world software bugs, Anthropic's current best public model scores 80.8%. Mythos scores 93.9%. On cybersecurity benchmarks specifically, the jump is similar: from 66.6% to 83.1%. These are not incremental improvements. They represent a different class of capability.
What makes this significant is not just the scale but the mechanism. Mythos does not simply scan for known vulnerability patterns. It chains findings together, identifying three or four individually minor flaws and combining them into a complete attack path. That is what elite human security researchers do. The model learned it as a byproduct of understanding code deeply.
Anthropic's response was to give the model to defenders first, through Project Glasswing: a coalition of the companies that build the infrastructure the internet runs on. The idea: find and patch vulnerabilities before the same capability reaches actors with less responsible intentions. Whether that head start holds as the capability spreads is an open question. What is already clear is that the technical vulnerability surface is becoming increasingly AI-solvable. The bugs that have been hiding for decades are being found. The patches are rolling out.
So if technical vulnerabilities are becoming an AI problem with an AI solution, what does that leave?
The Surface That Does Not Move
There is a class of security risk that Mythos cannot touch. It does not matter how well the model understands code, or how many zero-days it finds. None of that affects the colleague who clicks a convincing phishing link at 11pm when they are finishing a proposal under deadline. It does not affect the person who shares credentials through a messaging app because the official process is too slow. It does not affect the workaround that gets invented because the policy never quite fit the actual workflow.
Human behaviour under pressure is not a vulnerability that gets patched. It is, in many ways, the oldest attack surface there is, and one that becomes proportionally more exposed as the technical layer hardens.
Research into how people actually fail under security pressure reveals something counterintuitive: vulnerability is not constant. It peaks at predictable moments (high-workload periods, deadline pressure, understaffed stretches) when the cognitive overhead of security-conscious behaviour competes directly with the urgency of getting things done. An attacker who understands this does not need to find a flaw in your operating system. They just need to send the phishing email on the right day.
Most current risk models do not account for this. They treat the human layer as a training problem: run the awareness session, collect the attestation, close the loop. But the gap between what people know they should do and what they actually do under pressure is not closed by knowledge. It is closed by culture, context, and feedback, things that no model can install from the outside.
What an ISMS Should Actually Face
ISO 27001 was designed for a world where the dominant security risk was technical: unpatched systems, misconfigured networks, weak access controls. That world is not disappearing, but it is increasingly being handled. What the standard has always been thin on (human behaviour, the social dynamics of reporting, the conditions under which people make poor security decisions) is becoming the part that matters most.
This is a real design question for anyone building a management system today. The controls, the policies, the evidence trail: all necessary. But if the primary risk is human, the system needs to model human behaviour, not just document technical controls. It needs to capture the incidents that actually happen (the near-misses, the workarounds, the decisions made under pressure) and learn from them in something close to real time. Not as retrospective form-filling, but as a natural byproduct of how the organisation responds when things go slightly wrong.
Closing that loop is harder than patching a vulnerability. It requires the people inside the organisation to report what they see, to debrief what happened, to connect an unusual login prompt to a pattern that matters. It requires an ISMS that treats incidents as signals rather than failures to be documented and filed away.
This is the design problem we are working on, in collaboration with researchers at TalTech, because we think the behavioural layer of compliance is where the most interesting and most neglected questions live. Mythos accelerates the case for taking it seriously.
Where This Leaves Us
The era of the human security expert hunting bugs is ending. The era of the human security risk is not. If anything, as the technical layer becomes AI-native, the organisations that remain exposed will be the ones whose management systems were never designed for the surface that is actually left.
Mythos can find a 27-year-old bug in an operating system in weeks. It cannot find the moment your team stops behaving securely because nobody made it safe to say something.
That gap is yours to close.



