Agent Security Philosophy

Defense in Depth for AI Agents

Last updated: December 2025


Core Principle

Fruxon treats AI agent security as fundamentally different from traditional software security. Agents are autonomous, probabilistic, and operate with delegated authority—making them more analogous to employees than to APIs. Our security philosophy reflects this reality.

The Three Pillars
Isolation by Default

Every agent runs in a sandboxed environment with the minimum permissions required for its task. This isn't just container isolation—it's semantic isolation. An agent processing invoices cannot suddenly decide to send emails, even if the underlying infrastructure technically allows it.

Trust is Earned, Not Granted

Agents progress through trust levels based on demonstrated behavior, not configuration. A new agent starts with maximum restrictions and human oversight. As it accumulates successful executions without anomalies, it can graduate to higher autonomy—just like onboarding an employee.

Verification at Every Boundary

Every action an agent takes that crosses a trust boundary—accessing external systems, modifying data, communicating with users—passes through a verification layer. This includes prompt injection detection, output validation, and action approval workflows.

Human-in-the-Loop as Architecture

Most platforms bolt on approval workflows as a safety net. Fruxon treats human oversight as a first-class architectural component.

Our "connectors" enable conversational human-in-the-loop patterns—the agent can pause mid-execution, ask clarifying questions, present options, and resume based on human input. This isn't just "approve/reject"—it's genuine collaboration between human judgment and agent capability.

Enforceable policies, not suggestions. You can mandate human approval for specific action types—financial transactions above a threshold, external communications, data deletions, or any operation you define as high-stakes. The agent cannot bypass these gates; they're enforced at the platform level, not left to prompt engineering.

Human oversight is not a fallback—it's a feature. Agents that know when to ask are more trustworthy than agents that always guess.

Evaluation as the Security Gate

The insight that drives Fruxon: you cannot deploy what you cannot evaluate.

Traditional CI/CD asks "does the code compile and pass tests?" For agents, the question is "does this agent behave appropriately across the scenarios it will encounter?"

Golden datasets bound to agent versions ensure that every deployment is validated against expected behavior. If an agent regresses on previously-passing scenarios, it doesn't ship. This is the missing layer between "the model works" and "the agent is safe to deploy."

Practical Security Patterns
The Philosophy in One Sentence

"Treat agents like employees with access to sensitive systems: verify their work, limit their access, monitor their behavior, and build processes that assume they will occasionally make mistakes."