#oasb#benchmark#ai-agents#security#governance

OASB: Why AI Agents Need CIS-Style Security Benchmarks

OpenA2A TeamFebruary 21, 2026

Originally published on opena2a.org

AI agents are deploying into production faster than security teams can assess them. There is no standard way to measure whether an agent handles credentials safely, enforces least-privilege tool access, or produces auditable logs. OASB (Open Agent Security Benchmark) brings the CIS Benchmark model to agentic AI -- providing a structured, measurable framework for evaluating agent security posture.

The Assessment Gap

CIS Benchmarks transformed server and cloud security by giving teams a shared vocabulary and measurable controls. Before CIS, "is this server secure?" was a subjective question. After CIS, it became "this server passes 87 of 102 Level 1 controls."

AI agents face the same gap today. Teams deploy agents with tool access, API credentials, and autonomous decision-making capabilities, but have no standard way to evaluate whether those agents meet baseline security requirements. Each organization invents its own checklist -- or skips the assessment entirely.

OASB fills this gap with a structured benchmark designed specifically for the security properties that matter in agentic systems.

What OASB Covers

The benchmark defines 46 controls organized across 10 categories, with 3 maturity levels (L1, L2, L3) that map to increasing security rigor.

10 Control Categories

1. Identity and Authentication

2. Authorization and Access Control

3. Credential Management

4. Input Validation

5. Output Validation

6. Logging and Monitoring

7. Supply Chain Security

8. Configuration Security

9. Runtime Protection

10. Compliance and Governance

3 Maturity Levels

L1Essential controls every agent should meet. Covers basic credential hygiene, input validation, and logging.

L2Production-grade controls for agents handling sensitive data or making autonomous decisions.

L3Advanced controls for regulated environments -- cryptographic identity, formal verification, runtime containment.

Running an OASB Assessment

OASB assessments can be run through HackMyAgent. The benchmark mode evaluates an agent or project against OASB controls and produces a compliance score with specific findings.

# Run OASB benchmark against your project
npx hackmyagent benchmark

# Run against a specific OASB level
npx hackmyagent benchmark --level L1

# Output results as JSON
npx hackmyagent benchmark --format json

Each control produces a PASS, FAIL, or NOT_APPLICABLE result with specific evidence. Failed controls include remediation guidance explaining what to change and why it matters.

Why Benchmarks Matter for Adoption

Security benchmarks serve two audiences. For engineering teams, they provide a concrete checklist that removes ambiguity from "make it secure." For security and compliance teams, they provide measurable evidence that an agent meets organizational requirements.

As AI agents move from experimental to production workloads, the ability to demonstrate compliance against a recognized benchmark becomes a prerequisite for deployment approval. OASB provides that standard -- open source, vendor-neutral, and designed by practitioners who build and secure agent systems.

Run Your First OASB Assessment

46 controls. 10 categories. 3 maturity levels. Open source.

npx hackmyagent benchmark

OASB Specification View on GitHub

This is a condensed version of the full post. Read the complete article on opena2a.org