ZeroLeaks
AgentGuard

What is AgentGuard

Test deployed AI agents via their HTTP API endpoint for tool hijacking, multi-turn grooming, and data leakage.

What is AgentGuard

AgentGuard tests live deployed AI agents through their HTTP API endpoint. Unlike prompt-only scanning, AgentGuard exercises the full agent stack: your model, tools, and conversation handling. It discovers vulnerabilities that static prompt analysis misses.

Why AgentGuard

Prompt scanning (extraction and injection) validates how well your system prompt resists attacks when the model is invoked directly. Deployed agents add attack surface that prompt scanning cannot reach:

  • Tool hijacking — Malicious prompts that cause the agent to misuse tools (curl exfiltration, SSRF, reverse shell, unauthorized email)
  • Multi-turn grooming — Gradual escalation across turns (roleplay, authority transfer, Socratic priming, memory poisoning)
  • Data leakage — Credentials, PII, environment variables, conversation history exposed in responses
  • Indirect injection — Hidden instructions in documents, code comments, JSON fields, HTML comments
  • Authority exploitation — Fake system messages, compliance notices, developer impersonation
  • Protocol exploits — MCP shadowing, tool description poisoning, rules file exploitation

AgentGuard sends real HTTP requests to your agent endpoint and evaluates responses with an LLM evaluator. Each probe is scored for success or failure.

Two-Phase Architecture

AgentGuard runs in two phases:

  1. Phase 1: Full engine scan — The same extraction and injection engine used in the dashboard. Sends ~53 requests (30 extraction attacks + 23 injection probes) to your agent endpoint.
  2. Phase 2: Agent-specific probes — 60+ probes across 8 categories: tool hijacking (8), indirect injection (8), authority exploitation (5), protocol exploits (5), multi-turn grooming (5 sequences), data leakage (8), legacy behavior (21), plus dynamic tool-specific probes when you define tools.

Total: approximately 105 requests to your agent endpoint per scan.

Research Foundation

AgentGuard probes are based on:

  • arxiv:2601.17548 "Prompt Injection Attacks on Agentic Coding Assistants"
  • AgentDojo (ETH Zurich) — 97 tasks, 629 security test cases
  • InjecAgent — 1,054 test cases across 17 user tools
  • AgentVigil — MCTS-optimized attacks
  • OWASP LLM Top 10 2026

Next Steps

On this page