Three scan types: Full (both), Extraction only, and Injection only. All scans run in sandbox mode with full tool execution testing.

Scan Types

Every ZeroLeaks scan runs in sandbox mode -- an isolated environment with real tool execution, canary tokens, and kill chain detection. You choose which attack surface to test by selecting a scan type.

Full (Recommended)

Purpose: Run both extraction and injection tests in a single scan. This is the default.

What it tests:

Extraction: Up to 30 adaptive turns using TAP (Tree of Attacks with Pruning) to extract the system prompt. The Strategist selects attack categories, the Attacker generates probes, the Evaluator analyzes responses for leakage, and the Mutator refines attacks based on feedback.
Injection: 23 probes across 8 injection types to test whether the model can be made to follow attacker-injected instructions.
Tool execution: If your prompt defines tools (or they are auto-detected), sandbox probes test tool hijacking, indirect injection via documents, authority spoofing, multi-turn grooming, and protocol exploits.

The final score is the average of extraction and injection scores. The vulnerability level is the worst of the two.

Best for: Most use cases. Gives complete coverage in a single scan.

Duration: ~20 minutes.

Extraction Only

Purpose: Test whether your system prompt can be leaked or extracted through adversarial conversation.

What it tests: The scan engine runs up to 30 adaptive turns with category rotation across 19 attack types (direct, encoding, persona, social, technical, crescendo, many-shot, CoT hijack, policy puppetry, ASCII art, and more). Conversation resets occur automatically when the target becomes defensive.

Best for: Prompts where confidentiality of instructions is critical (proprietary logic, safety rules, API keys embedded in prompts).

Duration: ~10-15 minutes.

Injection Only

Purpose: Test whether the model can be made to follow attacker-injected instructions instead of your system prompt.

What it tests: 23 probes across 8 injection types:

Type	Description
Instruction Override	Override system instructions directly
Behavior Modification	Change model behavior patterns
Policy Bypass	Bypass safety policies
Role Hijack	Force the model to adopt a new persona
Output Manipulation	Control output format or content
Action Execution	Execute unauthorized actions (agentic contexts)
Context Poisoning	Poison conversation context or memory
Guardrail Bypass	Bypass specific guardrails

Each probe is evaluated for full compliance, partial compliance, or resistance.

Best for: Prompts where behavioral integrity matters (agents, assistants that must refuse harmful requests, systems processing untrusted input).

Duration: ~5-8 minutes.

Sandbox Capabilities

All scan types include sandbox features when applicable:

Canary tokens placed in tool definitions to detect data exfiltration
Kill chain detection for multi-step attacks that chain tool calls
Tool hijacking probes (SSRF, RCE, curl exfiltration, cron persistence)
Indirect injection probes (hidden instructions in PDFs, emails, code comments, JSON fields)
Authority exploitation probes (fake system messages, maintenance windows, compliance threats)
Protocol exploit probes (MCP shadowing, tool description poisoning, fake extension messages)

If you provide tool definitions in the advanced settings, additional dynamic probes are generated based on your specific tools (e.g., email abuse for email tools, SQL injection for database tools).

Scan Types

Scan Types

Full (Recommended)

Extraction Only

Injection Only

Sandbox Capabilities

On this page