Security & Threat Detection
How AIGodfather protects your AI agents from the OWASP LLM Top 10 2025 threats.
Overview
AIGodfather includes built-in threat detection that analyzes every span (LLM call, tool call, retrieval step) for threats in real-time. It detects prompt injection, system prompt leakage, improper output handling, supply chain attacks, and more — covering all 10 categories of the OWASP LLM Top 10 2025.
Threat detection runs automatically on every trace. No SDK changes are needed — if you're already sending traces, your agents are already protected.
How It Works
AIGodfather uses a 3-layer detection approach:
Pattern Scanner (all plans)
47+ regex patterns detect known attack signatures across 8 threat categories. Runs on both input and output of every span. Sub-millisecond, zero cost.
LLM Layer 1 — Primary Analysis (Growth+)
A dedicated AI model (Claude recommended) analyzes the full span content for subtle threats that patterns miss. Produces category, severity, confidence, and evidence.
LLM Layer 2 — Verification (Business+)
A second independent AI review confirms, rejects, or adjusts each finding. Dual-model consensus dramatically reduces false positives.
What Gets Scanned
- Every LLM call (input prompt + generated output)
- Every tool/function call (parameters + return values)
- Every retrieval step (queries + retrieved content)
- Agent-to-agent messages in multi-agent systems
Threat Categories
| Category | OWASP | Description |
|---|---|---|
| Prompt Injection | LLM01 | Attempts to override or bypass LLM instructions |
| Sensitive Info Disclosure | LLM02 | Leaking private data in outputs |
| Supply Chain | LLM03 | Unverified models, plugins, or dependencies |
| Data Poisoning | LLM04 | Corrupting agent memory or training data |
| Improper Output Handling | LLM05 | XSS, SQL injection, code execution in outputs |
| Excessive Agency | LLM06 | Unauthorized or manipulated tool calls |
| System Prompt Leakage | LLM07 | LLM revealing its instructions or credentials |
| Vector & Embedding | LLM08 | RAG poisoning and embedding manipulation |
Severity Levels
| Level | Description | Auto-Incident |
|---|---|---|
| Critical | Immediate danger — active exploitation attempt | Yes |
| High | Significant risk — likely attack in progress | Yes |
| Medium | Potential risk — suspicious but not confirmed | Optional |
| Low | Informational — minor anomaly detected | No |
Next Steps
- Threat Detection Coverage — detailed OWASP LLM Top 10 coverage map
- Network Protection — how the platform learns from every threat
- Agent Risk Score — understanding the 0-100 score
- Configuration — settings and tuning