Security & Threat Detection

How AIGodfather protects your AI agents from the OWASP LLM Top 10 2025 threats.

Overview

AIGodfather includes built-in threat detection that analyzes every span (LLM call, tool call, retrieval step) for threats in real-time. It detects prompt injection, system prompt leakage, improper output handling, supply chain attacks, and more — covering all 10 categories of the OWASP LLM Top 10 2025.

Threat detection runs automatically on every trace. No SDK changes are needed — if you're already sending traces, your agents are already protected.

How It Works

AIGodfather uses a 3-layer detection approach:

Pattern Scanner (all plans)

47+ regex patterns detect known attack signatures across 8 threat categories. Runs on both input and output of every span. Sub-millisecond, zero cost.

LLM Layer 1 — Primary Analysis (Growth+)

A dedicated AI model (Claude recommended) analyzes the full span content for subtle threats that patterns miss. Produces category, severity, confidence, and evidence.

LLM Layer 2 — Verification (Business+)

A second independent AI review confirms, rejects, or adjusts each finding. Dual-model consensus dramatically reduces false positives.

What Gets Scanned

Every LLM call (input prompt + generated output)
Every tool/function call (parameters + return values)
Every retrieval step (queries + retrieved content)
Agent-to-agent messages in multi-agent systems

⚡ Scanning is automatic and non-blocking. Your agents continue executing while threats are analyzed in the background.

Threat Categories

Category	OWASP	Description
Prompt Injection	LLM01	Attempts to override or bypass LLM instructions
Sensitive Info Disclosure	LLM02	Leaking private data in outputs
Supply Chain	LLM03	Unverified models, plugins, or dependencies
Data Poisoning	LLM04	Corrupting agent memory or training data
Improper Output Handling	LLM05	XSS, SQL injection, code execution in outputs
Excessive Agency	LLM06	Unauthorized or manipulated tool calls
System Prompt Leakage	LLM07	LLM revealing its instructions or credentials
Vector & Embedding	LLM08	RAG poisoning and embedding manipulation

Severity Levels

Level	Description	Auto-Incident
Critical	Immediate danger — active exploitation attempt	Yes
High	Significant risk — likely attack in progress	Yes
Medium	Potential risk — suspicious but not confirmed	Optional
Low	Informational — minor anomaly detected	No

Next Steps

Threat Detection Coverage — detailed OWASP LLM Top 10 coverage map
Network Protection — how the platform learns from every threat
Agent Risk Score — understanding the 0-100 score
Configuration — settings and tuning