ZeroLeaks

ZeroLeaks Documentation

AI red-teaming platform for testing system prompt extraction and injection vulnerabilities.

ZeroLeaks Documentation

ZeroLeaks is an AI red-teaming platform that tests how well your AI systems protect their configuration. It uses a multi-agent architecture based on TAP (Tree of Attacks with Pruning) methodology to systematically probe for system prompt extraction and injection vulnerabilities.

What you can do

Architecture overview

ZeroLeaks uses multiple specialized AI agents that coordinate attacks against your system:

  • Strategist selects the attack strategy based on target analysis
  • Attacker generates attack prompts across 19 categories
  • Evaluator analyzes target responses for information leakage
  • Mutator refines attacks based on evaluation feedback

Each scan runs 30 adaptive turns with automatic conversation resets, category rotation, and Best-of-N prompt mutations. The result is a security score (0-100), vulnerability classification, and actionable hardening recommendations.

On this page