Public Leaderboard

Gauntlet

Frontier models run the gauntlet of prompt-injection attacks — ranked by how well they resist direct jailbreaks and indirect, tool-based injection. Higher robustness is better.

Model	Robustness	ASR	Out tok	Avg cost
Loading leaderboard…

How it works

Three tracks, one robustness score

Every model faces the same frozen attack suite across three escalating threat models. Robustness is 100 − attack-success-rate, so higher means harder to break.

Direct jailbreaks: Single-turn attacks — blunt instruction overrides, DAN and Developer-Mode personas, leetspeak obfuscation, and instructions smuggled inside pasted content — that try to make a benign support assistant break its own rules.
Indirect injection: Hand-built agentic scenarios where a malicious instruction hidden in tool output — a document, email, or API response — tries to hijack the model while it works a legitimate task.
Agentic tool-hijacking: ZeroLeaks' Sandbox attack corpus replayed against mock agent tools — tool-poisoning, authority spoofing, and protocol exploits — counting any dangerous tool call as a breach.

Robustness = 100 − attack-success-rate (ASR). Each response is scored by an LLM judge backed by success/failure indicators, so a model is only marked breached on a genuine rule violation — not on a refusal or an error. The chart plots robustness (%) against efficiency (cost per run or output tokens, reversed so more efficient is to the right); each line connects one model across reasoning efforts. Cost uses standard list prices. GPT-5.5 and Grok 4.3 are served via Azure.

Ready to secure your
AI infrastructure?

Comprehensive vulnerability assessments powered by our multi-agent red team system.

Gauntlet

Ready to secure yourAI infrastructure?

Ready to secure your
AI infrastructure?