Agent Security Philosophy
Defense in Depth for AI Agents
Last updated: December 2025
Core Principle
Fruxon treats AI agent security as fundamentally different from traditional software security. Agents are autonomous, probabilistic, and operate with delegated authority—making them more analogous to employees than to APIs. Our security philosophy reflects this reality.
The Three Pillars
Isolation by Default
Every agent runs in a sandboxed environment with the minimum permissions required for its task. This isn't just container isolation—it's semantic isolation. An agent processing invoices cannot suddenly decide to send emails, even if the underlying infrastructure technically allows it.
Trust is Earned, Not Granted
Agents progress through trust levels based on demonstrated behavior, not configuration. A new agent starts with maximum restrictions and human oversight. As it accumulates successful executions without anomalies, it can graduate to higher autonomy—just like onboarding an employee.
Verification at Every Boundary
Every action an agent takes that crosses a trust boundary—accessing external systems, modifying data, communicating with users—passes through a verification layer. This includes prompt injection detection, output validation, and action approval workflows.
Human-in-the-Loop as Architecture
Most platforms bolt on approval workflows as a safety net. Fruxon treats human oversight as a first-class architectural component.
Our "connectors" enable conversational human-in-the-loop patterns—the agent can pause mid-execution, ask clarifying questions, present options, and resume based on human input. This isn't just "approve/reject"—it's genuine collaboration between human judgment and agent capability.
Enforceable policies, not suggestions. You can mandate human approval for specific action types—financial transactions above a threshold, external communications, data deletions, or any operation you define as high-stakes. The agent cannot bypass these gates; they're enforced at the platform level, not left to prompt engineering.
Human oversight is not a fallback—it's a feature. Agents that know when to ask are more trustworthy than agents that always guess.
Evaluation as the Security Gate
The insight that drives Fruxon: you cannot deploy what you cannot evaluate.
Traditional CI/CD asks "does the code compile and pass tests?" For agents, the question is "does this agent behave appropriately across the scenarios it will encounter?"
Golden datasets bound to agent versions ensure that every deployment is validated against expected behavior. If an agent regresses on previously-passing scenarios, it doesn't ship. This is the missing layer between "the model works" and "the agent is safe to deploy."
Practical Security Patterns
- Scope declarations:Agents explicitly declare what tools and data they need; anything outside that scope is inaccessible.
- Anomaly detection:Behavioral baselines flag when an agent deviates from expected patterns.
- Audit trails:Every decision, tool call, and output is logged with full context for forensic analysis.
- Graceful degradation:When an agent encounters uncertainty or potential security issues, it fails safe—escalating to humans rather than guessing.
The Philosophy in One Sentence
"Treat agents like employees with access to sensitive systems: verify their work, limit their access, monitor their behavior, and build processes that assume they will occasionally make mistakes."