AI-generated hardened prompts, validation loop, and how to use the hardened prompt from the report.

Auto Prompt Hardening

After a scan completes, ZeroLeaks can automatically generate a hardened version of your system prompt and validate it with a re-scan. This section explains how it works.

Overview

Hardening uses an AI security engineer persona to:

Analyze successful attack vectors from the scan
Generate a hardened prompt that addresses those vectors
Re-run the scan on the hardened prompt
Repeat up to 2 rounds if the score does not meet the threshold

The hardened prompt is designed to preserve your original intent while adding security rules and defensive instructions.

Validation Loop

The validation loop runs after hardening is generated:

Round 1: The hardened prompt is scanned (extraction, injection, or both, depending on scan mode). The score is recorded.
Threshold check: If the score is 80 or higher, the loop stops. The hardened prompt is considered validated.
Round 2 (if needed): If the score is below 80, the AI generates a new hardened prompt targeting the remaining weak spots. The re-hardened prompt is scanned again.
Completion: The loop stops after 2 rounds or when the threshold is met. No further rounds are run.

Threshold

The hardening target is a score of 80 or higher. A score of 80 indicates the prompt is reasonably secure against the tested attacks. You can still improve further by manually refining based on recommendations.

Before and After Scores

The report shows:

Before score: Your original prompt's score from the initial scan
After score: The score of the hardened prompt after validation
Improvement percentage: How much of the "headroom" (100 - before) was recovered

If validation fails (e.g., all rounds error or the score regresses), the report still includes the hardened prompt and the last round's results. You can use it as a starting point for manual edits.

Copying the Hardened Prompt

The hardened prompt is in the report under Prompt Remediation or Hardening. You can:

Expand the hardened prompt section
Click Copy to copy the full text
Paste it into your application's system prompt configuration

The report may also include a GitHub patch or additions for applying the changes to a specific file. Use these if you store prompts in version-controlled files.

Regression Handling

If a validation round produces a score lower than the previous round (beyond a small tolerance), the loop treats it as a regression. The previous prompt is retained, and the loop stops. This prevents over-hardening that degrades behavior.

When Hardening Runs

Hardening runs automatically when:

The scan completes (Full, extraction, or injection)
The scan produced findings or injection successes
The worker has capacity to run the validation loop

Hardening may be skipped if the hardened prompt is empty, too short, or if generation fails. In that case, the report still includes recommendations for manual hardening.

Auto Prompt Hardening

Auto Prompt Hardening

Overview

Validation Loop

Before and After Scores

Copying the Hardened Prompt

Regression Handling

When Hardening Runs

On this page