Defense Tester (Blue Team)
Benchmarks system prompt hardening against a curated adversarial battery.
Built-in Defense Profiles
minimalstandardhardenedmaximum
What It Measures
- Baseline vs defended block rate
- Improvement percentage
- Weakest surviving attack categories
Run
uv run python main.py --module defense-tester --defense-profile hardened --system-prompt "You are a helpful assistant."