OASB: Why AI Agents Need CIS-Style Security Benchmarks
Originally published on opena2a.org
AI agents are deploying into production faster than security teams can assess them. There is no standard way to measure whether an agent handles credentials safely, enforces least-privilege tool access, or produces auditable logs. OASB (Open Agent Security Benchmark) brings the CIS Benchmark model to agentic AI -- providing a structured, measurable framework for evaluating agent security posture.
The Assessment Gap
CIS Benchmarks transformed server and cloud security by giving teams a shared vocabulary and measurable controls. Before CIS, "is this server secure?" was a subjective question. After CIS, it became "this server passes 87 of 102 Level 1 controls."
AI agents face the same gap today. Teams deploy agents with tool access, API credentials, and autonomous decision-making capabilities, but have no standard way to evaluate whether those agents meet baseline security requirements. Each organization invents its own checklist -- or skips the assessment entirely.
OASB fills this gap with a structured benchmark designed specifically for the security properties that matter in agentic systems.
What OASB Covers
The benchmark defines 46 controls organized across 10 categories, with 3 maturity levels (L1, L2, L3) that map to increasing security rigor.
10 Control Categories
3 Maturity Levels
Running an OASB Assessment
OASB assessments can be run through HackMyAgent. The benchmark mode evaluates an agent or project against OASB controls and produces a compliance score with specific findings.
# Run OASB benchmark against your project
npx hackmyagent benchmark
# Run against a specific OASB level
npx hackmyagent benchmark --level L1
# Output results as JSON
npx hackmyagent benchmark --format jsonEach control produces a PASS, FAIL, or NOT_APPLICABLE result with specific evidence. Failed controls include remediation guidance explaining what to change and why it matters.
Why Benchmarks Matter for Adoption
Security benchmarks serve two audiences. For engineering teams, they provide a concrete checklist that removes ambiguity from "make it secure." For security and compliance teams, they provide measurable evidence that an agent meets organizational requirements.
As AI agents move from experimental to production workloads, the ability to demonstrate compliance against a recognized benchmark becomes a prerequisite for deployment approval. OASB provides that standard -- open source, vendor-neutral, and designed by practitioners who build and secure agent systems.
Run Your First OASB Assessment
46 controls. 10 categories. 3 maturity levels. Open source.
npx hackmyagent benchmarkThis is a condensed version of the full post. Read the complete article on opena2a.org
© 2026 OpenA2A. Open source under Apache-2.0 License.