SecurityOverview

Security & Threat Detection

How AIGodfather protects your AI agents from the OWASP LLM Top 10 2025 threats.

Overview

AIGodfather includes built-in threat detection that analyzes every span (LLM call, tool call, retrieval step) for threats in real-time. It detects prompt injection, system prompt leakage, improper output handling, supply chain attacks, and more — covering all 10 categories of the OWASP LLM Top 10 2025.

Threat detection runs automatically on every trace. No SDK changes are needed — if you're already sending traces, your agents are already protected.

How It Works

AIGodfather uses a 3-layer detection approach:

1

Pattern Scanner (all plans)

47+ regex patterns detect known attack signatures across 8 threat categories. Runs on both input and output of every span. Sub-millisecond, zero cost.

2

LLM Layer 1 — Primary Analysis (Growth+)

A dedicated AI model (Claude recommended) analyzes the full span content for subtle threats that patterns miss. Produces category, severity, confidence, and evidence.

3

LLM Layer 2 — Verification (Business+)

A second independent AI review confirms, rejects, or adjusts each finding. Dual-model consensus dramatically reduces false positives.

What Gets Scanned

  • Every LLM call (input prompt + generated output)
  • Every tool/function call (parameters + return values)
  • Every retrieval step (queries + retrieved content)
  • Agent-to-agent messages in multi-agent systems
Scanning is automatic and non-blocking. Your agents continue executing while threats are analyzed in the background.

Threat Categories

CategoryOWASPDescription
Prompt InjectionLLM01Attempts to override or bypass LLM instructions
Sensitive Info DisclosureLLM02Leaking private data in outputs
Supply ChainLLM03Unverified models, plugins, or dependencies
Data PoisoningLLM04Corrupting agent memory or training data
Improper Output HandlingLLM05XSS, SQL injection, code execution in outputs
Excessive AgencyLLM06Unauthorized or manipulated tool calls
System Prompt LeakageLLM07LLM revealing its instructions or credentials
Vector & EmbeddingLLM08RAG poisoning and embedding manipulation

Severity Levels

LevelDescriptionAuto-Incident
CriticalImmediate danger — active exploitation attemptYes
HighSignificant risk — likely attack in progressYes
MediumPotential risk — suspicious but not confirmedOptional
LowInformational — minor anomaly detectedNo

Next Steps