Simbian Launches Cyber Defense Benchmark, Reports Frontier LLMs Fall Short on Attack Detection

Photo Credit: Simbian

Corporate Business News Sponsored by Business Watch Global

MOUNTAIN VIEW, Calif., April 29, 2026 (VSNewsNetwork.com)Cybersecurity company Simbian has announced the formation of the Simbian Research Lab and the release of the Simbian Cyber Defense Benchmark.

The benchmark is designed to evaluate large language models (LLMs) on their ability to detect MITRE ATT&CK chains in complex scenarios using real attack telemetry. According to Simbian, none of the eleven frontier models tested achieved a passing score when tasked with cyber defense investigations.

Anthropic Claude Opus 4.6 achieved the highest performance among the models tested, detecting an average of 46% of attack evidence per MITRE tactic. Simbian states that every model missed entire attack categories.

@VSNewsNetwork

@VSNewsNetwork •

“Our research shows you can't throw an LLM dart in the dark and expect to hit the cyber defense bullseye. The same frontier models that perform strongly during cyberattacks struggle on the defense side. Defense is fundamentally harder than offense as it requires reasoning across noisy, partial evidence rather than executing against a known target. The LLMs must be accompanied by outside intelligence in the form of a sophisticated harness. Simbian has been able to get 95% accuracy in production enterprise environments on cyber defense SecOps following some of these techniques,” said Ambuj Kumar, Founder and Chief Executive Officer of Simbian.

The benchmark differs from prior cybersecurity benchmarks by using real attack telemetry in an agentic investigation format rather than curated questions. Models from Anthropic, OpenAI, Google and open-weight models from Alibaba, Minimax, DeepSeek and Moonshot AI were tested using a simple ReAct loop and asked to identify attackers and associated tactics.

Anthropic Opus 4.6 identified three times more flags than Google Gemini 3 Flash, but at approximately 100 times the cost, according to Simbian.

"We know the large models can do amazing things, but can we measure their efficacy in analyzing machine logs for security events? This benchmark answers that question. In contrast to existing AI security benchmarks, this benchmark was designed to be difficult to game. It uses real telemetry rather than curated questions, mutates context to prevent memorization, enforces deterministic scoring against ground truth, and tracks detection cost alongside accuracy,” said Richard Stiennon, Chief Research Analyst at IT-Harvest.

Full benchmark results are available in a blog post and the research has been published on arXiv. The company will discuss the findings during a webinar scheduled for April 29.

For more information, visit www.simbian.ai.

Source: Simbian

Next
Next

Weathermatic Appoints Lex Mason as Chief Executive Officer