Agent Security Philosophy
Defense in Depth for AI Agents
Last updated: February 2026
Core Principle
Fruxon treats AI agent security as fundamentally different from traditional software security. Agents are autonomous, probabilistic, and operate with delegated authority—making them more analogous to employees than to APIs. Our security philosophy reflects this reality.
Traditional application security assumes deterministic software executing fixed logic. Agent security must account for emergent behavior, adversarial inputs designed to manipulate reasoning, and the fact that an agent's actions are only as trustworthy as the context it operates in. Every layer of our platform is built with this distinction in mind.
The Three Pillars
Isolation by Default
Every agent runs in a sandboxed environment with only the tools and permissions its builder explicitly granted. This isn’t just container isolation—it’s semantic isolation. An agent processing invoices cannot suddenly decide to send emails, because its builder never gave it access to an email tool.
Scoped by Design
The agent builder explicitly defines which tools each agent can access. There are no implicit permissions and no way for an agent to escalate its own scope. If a tool isn’t granted, it doesn’t exist in the agent’s world.
Verification at Every Boundary
Every action an agent takes that crosses a trust boundary—accessing external systems, modifying data, communicating with users—passes through a verification layer. This includes prompt injection detection, output validation, and action approval workflows.
Prompt Injection Defense
Prompt injection is the most significant threat to AI agents in production. It occurs when adversarial inputs—whether from users directly or embedded in external data sources—manipulate an agent into performing unintended actions.
Fruxon distinguishes between two attack surfaces:
- Direct injection:A user crafts inputs designed to override the agent’s instructions, bypass guardrails, or extract system prompts.
- Indirect injection:Malicious instructions are hidden inside data the agent processes—documents, emails, web pages, database records, or API responses. The agent treats them as legitimate content and follows the embedded instructions.
Our defense is architectural, not aspirational. Security controls operate outside the LLM's reasoning loop—because you cannot rely on a probabilistic model to enforce its own security boundaries.
- Input classification:A dedicated classification layer screens all inputs before they reach the agent, flagging patterns associated with injection attempts.
- Context boundary enforcement:The platform maintains a strict distinction between trusted instructions (from the agent’s configuration and policies) and untrusted data (everything else). External data is treated as content, never as commands.
- Privilege separation:Even if an injection attempt succeeds in altering agent reasoning, the platform’s permission system prevents the agent from performing actions outside the scope its builder defined. Tools that weren’t explicitly granted are unreachable.
The key insight: a well-designed agent platform makes prompt injection a nuisance, not a catastrophe. Even a successfully manipulated agent cannot exceed its permission boundaries.
Tool & Action Security
Agents interact with the real world through tools—API calls, database queries, file operations, messaging systems. In Fruxon, the agent builder explicitly defines which tools an agent has access to. The agent never discovers tools on its own. Every tool is an attack surface, and every tool call is a trust boundary crossing.
- Explicit scope declarations:The agent builder explicitly defines which tools the agent will have access to at definition time. The agent cannot discover or request additional tools at runtime—if a tool isn’t in the declared scope, it simply doesn’t exist from the agent’s perspective.
- Parameter validation:Every tool call passes through schema validation. Arguments are checked against expected types, ranges, and patterns before execution. Malformed or unexpected parameters are rejected.
- Rate limiting:Tool invocations are rate-limited per agent, per tool, and per time window. This prevents runaway agents from overwhelming external systems and limits the blast radius of compromised agents.
- Destructive action gates:Operations classified as destructive—data deletion, financial transactions, external communications—require explicit approval before execution. These gates are enforced at the platform level and cannot be bypassed through prompt engineering.
Tools are the hands of the agent. We control what tools are available, validate every parameter, and gate every high-stakes action—so a compromised agent has no hands to act with.
Just-in-Time Permissions
Most platforms require broad upfront access grants—give the agent all the API keys, database credentials, and OAuth tokens it might ever need. This creates a large, static attack surface. Fruxon takes the opposite approach.
Per-user OAuth, on demand. For integrations that support OAuth, permissions are scoped per user, not per agent or per organization. When an agent needs to use a tool that requires authorization, the user receives the OAuth consent link at that moment—not before. The agent cannot access the integration until the specific user has explicitly granted permission.
This means:
- Per-user permission boundaries:Different users interacting with the same agent can have different access levels. An agent can only act within the permissions each individual user has granted.
- Revocation at any time:Users can revoke OAuth grants independently, immediately cutting the agent’s access to that integration for that user—without affecting anyone else.
- Consent transparency:The user always sees exactly which permissions are being requested and why, through the standard OAuth consent flow of the integration provider.
The safest credential is one that doesn't exist yet. By deferring authorization to the moment it's needed, we minimize the window of exposure and keep users in control of what their agents can access.
Human-in-the-Loop as Architecture
Most platforms bolt on approval workflows as a safety net. Fruxon treats human oversight as a first-class architectural component.
Our connectors enable conversational human-in-the-loop patterns—the agent can pause mid-execution, ask clarifying questions, present options, and resume based on human input. This isn't just approve/reject—it's genuine collaboration between human judgment and agent capability.
Enforceable policies, not suggestions. You can mandate human approval for specific action types—financial transactions above a threshold, external communications, data deletions, or any operation you define as high-stakes. The agent cannot bypass these gates; they're enforced at the platform level, not left to prompt engineering.
Human oversight also extends to the credential layer. When an agent encounters an integration requiring OAuth, the human is brought directly into the authorization flow. The agent pauses, the user completes the OAuth consent, and only then does execution resume—the human is the gatekeeper, not a rubber stamp.
Human oversight is not a fallback—it's a feature. Agents that know when to ask are more trustworthy than agents that always guess.
Evaluation as the Security Gate
The insight that drives Fruxon: you cannot deploy what you cannot evaluate.
Traditional CI/CD asks "does the code compile and pass tests?" For agents, the question is "does this agent behave appropriately across the scenarios it will encounter?"
Golden datasets bound to agent versions ensure that every deployment is validated against expected behavior. If an agent regresses on previously-passing scenarios, it doesn't ship. This is the missing layer between "the model works" and "the agent is safe to deploy."
Security-specific evaluation. Beyond functional correctness, our evaluation framework tests for security-relevant behaviors:
- Injection resistance:Test suites include adversarial inputs designed to probe injection vulnerabilities—both direct and indirect.
- Boundary compliance:Evaluations verify that agents respect their declared scope, never attempting actions outside their permissions.
- Escalation behavior:Tests confirm that agents properly escalate to humans in ambiguous or high-stakes scenarios instead of guessing.
- Behavioral regression:Any change to an agent’s configuration, prompt, or underlying model triggers a full re-evaluation against the golden dataset. Regressions block deployment automatically.
Output Safety & Guardrails
An agent's outputs are just as dangerous as its inputs—perhaps more so, because outputs are often passed directly to users, external systems, or downstream processes. Fruxon treats all LLM-generated outputs as untrusted until validated.
- Schema enforcement:When agents produce structured outputs—JSON payloads, API parameters, database queries—the platform validates them against expected schemas before execution. Malformed outputs are caught before they reach downstream systems.
- Content filtering:Output guardrails screen for sensitive content, harmful instructions, and data that shouldn’t leave the agent’s context—such as system prompts, internal configuration details, or data from other tenants.
- PII detection:Outputs are scanned for personally identifiable information that the agent may have inadvertently included. Detected PII can be redacted, flagged, or blocked depending on the policy configuration.
- Downstream injection prevention:LLM outputs that will be passed to other systems (databases, APIs, rendering engines) are sanitized to prevent second-order injection attacks—SQL injection, XSS, or command injection via generated content.
Guardrails are not just about what goes into the model. What comes out matters equally. We validate outputs with the same rigor we apply to inputs.
Observability & Audit Trails
You cannot secure what you cannot see. Fruxon provides full observability into agent behavior—not just for debugging, but as a security control.
- Complete audit trail:Every decision, tool call, input, output, and human interaction is logged with full context. This creates an immutable record for forensic analysis, compliance audits, and incident investigation.
- Behavioral baselines:The platform establishes baselines for normal agent behavior—tool call frequency, execution duration, output patterns. Deviations trigger alerts before they become incidents.
- Real-time monitoring:Live dashboards surface agent activity, error rates, cost anomalies, and security events. Teams can intervene in real time when something looks wrong.
- Cost and resource tracking:Unusual spikes in token consumption, tool calls, or execution time can indicate a compromised or runaway agent. Resource monitoring doubles as a security signal.
Observability is the immune system of an agent platform. Without it, you're flying blind. With it, you can detect, diagnose, and respond to threats before they cause damage.
Supply Chain & Provider Security
AI agents depend on external LLM providers, and that dependency is a security surface. A provider outage, model regression, or compromised model can cascade into agent-level failures.
Multi-provider failover. Fruxon's infrastructure is designed to operate across multiple LLM providers. This isn't just reliability engineering—it's a security posture. No single provider failure can take down your agents, and no single provider has exclusive access to your agent's operational data.
- Model version pinning:Agent deployments are locked to specific model versions. When a provider updates or deprecates a model, your agent continues running on the validated version until you explicitly promote a new one through the evaluation gate.
- Provider assessment:We evaluate LLM providers against security, privacy, and data handling criteria. Your data is only sent to providers that meet our requirements.
- Integration vetting:Third-party tools and connectors go through a security review before they’re available on the platform. We assess data handling, authentication mechanisms, and permission models.
Your agents should not inherit the risk profile of a single vendor. Multi-provider architecture is a security decision as much as a reliability one.
Practical Security Patterns
Beyond architectural principles, these patterns are implemented across the platform:
- Graceful degradation:When an agent encounters uncertainty, permission errors, or potential security issues, it fails safe—escalating to humans rather than guessing. Safe failure is the default.
- Deterministic versioning:Every agent version is immutable and fully reproducible. You can roll back to any previous version instantly, which means a compromised deployment can be reverted in seconds.
- Canary deployments:Route a fraction of traffic to a new agent version before full promotion. If the canary shows anomalous behavior, the rollout is halted automatically.
- Secret isolation:Secrets are injected at runtime and scoped to specific agent versions and environments. Secrets for staging cannot leak to production, and one agent’s secrets are invisible to another.
- Keys are write-only and never reach the LLM:API keys and credentials are stored encrypted at rest and are never returned from the API once saved—they are write-only. Critically, secrets are never included in the context sent to any LLM provider. The agent references a secret by name; the platform resolves it at the execution layer. This means a prompt injection attack cannot exfiltrate credentials, because the credentials never exist in the model’s context window.
- Multi-tenant data separation:Each customer’s agent data—execution context, conversation history, evaluation datasets, and configuration—is strictly isolated. No cross-tenant data leakage is possible at the platform level.
The Philosophy in One Sentence
"Treat agents like employees with access to sensitive systems: verify their work, limit their access, monitor their behavior, and build processes that assume they will occasionally make mistakes."