Allow known destinations, block suspicious domains, and keep internal agents on approved routes.
Stop unsafe agent traffic before it leaves your stack.
TrapDefense is a self-hosted security proxy for OpenAI, Anthropic, MCP, and internal HTTP tools. Control risky destinations, redact sensitive data, and keep audit evidence without routing production traffic through a third-party SaaS. Inspired by DeepMind's AI Agent Traps research.
# run TrapDefense on a private host asr proxy --config config.yaml # point your model client at the proxy client = OpenAI( base_url="https://gateway.company.com/v1", api_key="unused-if-proxy-injects-key", ) # policy controls egress, PII redaction, and audit logs
Record structured events for policy decisions, stream findings, redaction, and rollout review.
Catch emails, API keys, Korean PII, and other sensitive values before downstream exposure.
Start with the self-hosted proxy, then add SDK hooks and enterprise rollout support when needed.
The model is not the only risk. Outbound action is.
Prompt filtering helps, but real damage happens when an agent calls an external API, hits an MCP server, posts to a webhook, or leaks sensitive output. TrapDefense sits in that execution path as a policy gate with audit evidence.
Damage happens when the agent actually reaches a destination.
Hidden instructions matter, but the business risk appears when an agent sends data outward, invokes a high-risk tool, or crosses a boundary your team did not intend.
Built for the trap layers where runtime controls are strongest today.
Google DeepMind's AI Agent Traps paper maps multiple attack surfaces across perception, reasoning, memory, action, multi-agent dynamics, and human oversight. TrapDefense does not pretend to solve the whole map. It focuses on perception and action, where practical controls can be deployed now.
Practical controls for real agent traffic.
TrapDefense stays focused on the controls teams need first: outbound policy enforcement, sensitive-data protection, rollout safety, and auditability across agent execution paths.
Gate risky outbound requests
Allow approved domains, block suspicious egress, and keep agent traffic on known routes before execution completes.
Redact sensitive data
Detect and redact PII in requests and outputs so agents do not leak secrets by accident.
Protect MCP and internal tool workflows
Cover MCP traffic, OpenAI-compatible traffic, Anthropic traffic, and generic internal HTTP paths.
Ship policies safely
Roll out new rules with shadow, warn, and enforce modes rather than turning on hard blocks from day one.
Scan for lightweight injection signals
Catch hidden text, HTML comment attacks, base64-like blobs, prompt-injection phrases, and exfiltration-shaped language.
Keep audit trails
Write structured events for findings, decisions, streaming scans, redaction, and rollout review.
A self-hosted defense layer built for rollout.
TrapDefense fits the way real teams ship: deploy behind your own domain, observe first, tighten gradually, and keep the evidence needed to tune policies without breaking production.
Inspect inbound and outbound context
Normalize requests, inspect content for lightweight risk signals, and build a decision context before forwarding.
Evaluate policy before forwarding
Apply allowlists, blocklists, egress controls, PII checks, and mode-aware decisions before the request leaves your stack.
Protect outputs and record evidence
Redact sensitive output, inspect streams, and leave a trace your team can review later.
Built for teams operating agents in the real world.
TrapDefense is strongest where outbound action matters more than text: MCP servers, internal assistants, agent gateways, and tool-calling systems connected to sensitive company workflows.
Protect MCP servers
Put policy enforcement and audit logging in front of MCP traffic without sending data through a third-party gateway.
Control model traffic
Route OpenAI-compatible and Anthropic traffic through one policy gate with tenant-aware controls.
Reduce exfiltration risk
Block suspicious destinations, review generic HTTP egress, and redact sensitive data before it leaves the system.
Roll out security safely
Start with observation, move into warning, then enforce with confidence after policy tuning.
Deploy the proxy in your environment, not ours.
TrapDefense is designed for self-hosted deployment first. Put a reverse proxy on your domain, keep provider traffic on your own path, and add enterprise workflow support only when your team needs it.
Self-hosted data plane
Run TrapDefense behind your own domain and forward traffic to OpenAI, Anthropic, MCP servers, or internal HTTP targets.
One policy path, multiple modes
Use the same decision engine for shadow, warn, and enforce so rollout behavior matches production behavior.
Enterprise layer when needed
Add policy management, rollout support, audit workflows, and onboarding help for production teams.
Client
from openai import OpenAI client = OpenAI( base_url="https://gateway.company.com/v1", api_key="unused-if-proxy-injects-key" )
Policy config
{
"mode": "enforce",
"block_egress": true,
"domain_allowlist": ["api.openai.com", "api.anthropic.com"],
"pii_action": "warn",
"redact_response_pii": true
}
Open source first. Enterprise when your team needs an operating layer.
TrapDefense starts as an open-source, self-hosted security proxy. Enterprise expands the operating layer with shared policies, audit workflows, rollout support, and onboarding for serious teams.
Deploy the proxy now.
Use the proxy and core policy engine for egress control, MCP integration, content scanning, PII redaction, and audit logs.
Operate policies across a team.
TrapDefense Enterprise is the next layer for teams that need centralized governance, onboarding help, and security operations around agent runtime controls.
Start an enterprise conversation.
Tell us about your stack, your outbound risk, and what you want to protect. We'll route the inquiry to hellocosmos@gmail.com.
What teams usually ask first.
The goal is not to pretend every security problem is solved. The goal is to be crisp about where TrapDefense is strongest today.
What does TrapDefense protect against?
TrapDefense is strongest at action-layer risk: unsafe destinations, risky egress, sensitive output exposure, and policy enforcement around MCP, LLM, and internal HTTP traffic. It also adds lightweight detection for content-injection signals. It does not claim to solve every agent-security problem.
Is this a full agent security platform?
No. TrapDefense is a self-hosted security proxy and policy engine, not a complete security platform. It focuses on practical runtime controls teams can deploy now.
Why not just use prompt filtering?
Prompt filtering can catch some suspicious input, but it does not control the final outbound request. TrapDefense adds policy enforcement at the point where traffic is about to leave your stack.
Do I need MCP to use TrapDefense?
No. MCP is a strong use case, but TrapDefense also works as a proxy in front of OpenAI-compatible traffic, Anthropic traffic, and internal HTTP workflows.