
Frontier AI models are no longer merely helping engineers write code faster or automate routine tasks. They are increasingly capable of spotting their mistakes.
Anthropic says its newest model, Claude Opus 4.6, excels at discovering the kinds of software weaknesses that underpin major cyberattacks. According to a report from the company’s Frontier Red Team, during testing, Opus 4.6 identified over 500 previously unknown zero-day vulnerabilities—flaws that are unknown to people who wrote the software, or the party responsible for patching or fixing it—across open-source software libraries. Notably, the model was not explicitly told to search for the security flaws, but rather it detected and flagged the issues on its own.
Anthropic says the “results show that language models can add real value on top of existing discovery tools,” but acknowledged that the capabilities are also inherently “dual use.”
The same capabilities that help companies find and fix security flaws can just as easily be weaponized by attackers to discover and exploit the vulnerabilities before defenders can find them. An AI model that can autonomously identify zero-day exploits in widely used software could accelerate both sides of the cybersecurity arms race—potentially tipping the advantage toward whoever acts fastest.
Representatives for Antropic did not immediately respond to a request for comment on the cybersecurity risks. However, Logan Graham, head of Anthropic’s frontier red team, told Axios that the company views cybersecurity as a competition between offense and defense, and wants to ensure defenders get access to these tools first.
To manage some of the risk, Anthropic is deploying new detection systems that monitor Claude’s internal activity as it generates responses, using what the company calls “probes” to flag potential misuse in real time. The company says it’s also expanding its enforcement capabilities, including the ability to block traffic identified as malicious. Anthropic acknowledges this approach will create friction for legitimate security researchers and defensive work, and has committed to collaborating with the security community to address those challenges. The safeguards, the company says, represent “a meaningful step forward” in detecting and responding to misuse quickly, though the work is ongoing.
OpenAI, in contrast, has taken a more cautious approach with its new coding model, GPT-5.3-Codex, also released on Thursday. The company has emphasized that while the model was a bump up in coding performance, serious cybersecurity risks come with those gains. OpenAI CEO Sam Altman said in a post on X that GPT-5.3-Codex is the first model to be rated “high” for cybersecurity risk under the company’s internal preparedness framework.
As a result, OpenAI is rolling out GPT-5.3-Codex with tighter controls. While the model is available to paid ChatGPT users for everyday development tasks, the company is delaying full API access and restricting high-risk use cases that could enable automation at scale. More sensitive applications are being gated behind additional safeguards, including a trusted-access program for vetted security professionals. OpenAI said in a blog post accompanying the launch that it does not yet have “definitive evidence” the model can fully automate cyberattacks but is taking a precautionary approach, deploying what it described as its most comprehensive cybersecurity safety stack to date, including enhanced monitoring, safety training, and enforcement mechanisms informed by threat intelligence.











