Auto Prompt Hardening
AI-generated hardened prompts, validation loop, and how to use the hardened prompt from the report.
Auto Prompt Hardening
After a scan completes, ZeroLeaks can automatically generate a hardened version of your system prompt and validate it with a re-scan. This section explains how it works.
Overview
Hardening uses an AI security engineer persona to:
- Analyze successful attack vectors from the scan
- Generate a hardened prompt that addresses those vectors
- Re-run the scan on the hardened prompt
- Repeat up to 2 rounds if the score does not meet the threshold
The hardened prompt is designed to preserve your original intent while adding security rules and defensive instructions.
Validation Loop
The validation loop runs after hardening is generated:
- Round 1: The hardened prompt is scanned (extraction, injection, or both, depending on scan mode). The score is recorded.
- Threshold check: If the score is 80 or higher, the loop stops. The hardened prompt is considered validated.
- Round 2 (if needed): If the score is below 80, the AI generates a new hardened prompt targeting the remaining weak spots. The re-hardened prompt is scanned again.
- Completion: The loop stops after 2 rounds or when the threshold is met. No further rounds are run.
Threshold
The hardening target is a score of 80 or higher. A score of 80 indicates the prompt is reasonably secure against the tested attacks. You can still improve further by manually refining based on recommendations.
Before and After Scores
The report shows:
- Before score: Your original prompt's score from the initial scan
- After score: The score of the hardened prompt after validation
- Improvement percentage: How much of the "headroom" (100 - before) was recovered
If validation fails (e.g., all rounds error or the score regresses), the report still includes the hardened prompt and the last round's results. You can use it as a starting point for manual edits.
Copying the Hardened Prompt
The hardened prompt is in the report under Prompt Remediation or Hardening. You can:
- Expand the hardened prompt section
- Click Copy to copy the full text
- Paste it into your application's system prompt configuration
The report may also include a GitHub patch or additions for applying the changes to a specific file. Use these if you store prompts in version-controlled files.
Regression Handling
If a validation round produces a score lower than the previous round (beyond a small tolerance), the loop treats it as a regression. The previous prompt is retained, and the loop stops. This prevents over-hardening that degrades behavior.
When Hardening Runs
Hardening runs automatically when:
- The scan completes (Full, extraction, or injection)
- The scan produced findings or injection successes
- The worker has capacity to run the validation loop
Hardening may be skipped if the hardened prompt is empty, too short, or if generation fails. In that case, the report still includes recommendations for manual hardening.