ADR: Running AI-Generated Code — Lambda MicroVMs vs. Isolated Containers
Listen to study
generated on playGenerated only on first play
Powered by Amazon Polly + OmniVoice
Architecture decision on how to safely execute AI-agent-generated or user-submitted code in a multi-tenant environment, comparing Lambda MicroVMs (AWS launch June 2026), ephemeral containers with gVisor/Firecracker on ECS/EKS, and standard Lambda. The recommended decision is per-session Lambda MicroVMs orchestrated by Bedrock AgentCore, providing VM-level isolation without the operational overhead of managing your own virtualization stack.
When an AI agent writes and executes arbitrary code inside your system, the threat model changes fundamentally. It is no longer about validating input — it is about containing blast radius. With the launch of Lambda MicroVMs in June 2026, AWS delivered a serverless primitive with VM-level isolation, state preserved across invocations, and sub-second cold start. This ADR documents the decision to adopt it as the untrusted-code execution runtime for the agent platform, and why the alternatives were ruled out.
Case Facts
- System
- Multi-tenant AI agent platform (reference scenario)
- New primitive
- AWS Lambda MicroVMs — launched June 2026
- Available regions (launch)
- us-east-1, us-east-2, us-west-2, ap-northeast-1, eu-west-1
- Related launches (Jun/2026)
- Lambda Managed Instances (managed EC2); async payload 256 KB → 1 MB
- Agent orchestrator
- Amazon Bedrock AgentCore
- Relevant AI models
- Bedrock AgentCore (e.g., Claude 3.x, third-party models via Bedrock)
- Application domain
- Multi-tenant code interpreter; AI-generated financial simulations
- Primary risk
- Sandbox escape across tenants; cross-session data exfiltration
- Decision status
- Accepted
The Problem: Multi-Tenant Code Interpreter Is a Containment Problem, Not a Validation Problem
AI agent platforms that execute code — whether Python generated by the model for data analysis, financial simulation scripts, or custom tools invoked via function calling — face a security problem that cannot be solved by input sanitization. Code generated by an LLM is, by definition, not auditable in real time. The model can produce code that attempts to read environment variables, make unexpected syscalls, or — in multi-tenant scenarios — access shared memory structures if isolation is insufficient.
The relevant threat model has three primary vectors: (1) sandbox escape — malicious or malformed code that exits the execution context and reaches the host or other tenants; (2) lateral exfiltration — code that reads data from neighboring sessions via shared memory, temporary filesystem, or environment variables inherited from previous executions on the same worker; (3) unwanted persistence — code that leaves artifacts (processes, files, open connections) that affect subsequent executions.
In the financial context, the risk is even more concrete: imagine an agent generating and executing a Monte Carlo simulation with a client's proprietary portfolio data. If the sandbox does not adequately isolate, a subsequent execution by another tenant may inherit state, variables, or network connections from the previous session. This is not hypothetical — it is exactly the vector that motivated AWS's design of Firecracker for the original Lambda and for Fargate, and that now motivates Lambda MicroVMs as a first-class primitive for user-generated code.
Standard Lambda already uses Firecracker internally, but the execution model reuses sandboxes across invocations of the same tenant (warm start) and does not provide isolation guarantees between distinct agent sessions of the same tenant, let alone between different tenants. For a multi-tenant code interpreter, this is unacceptable.
The Forces at Play: Why This Decision Is Hard
There is a genuine tension across five axes that cannot all be resolved simultaneously:
Isolation vs. Latency. Strong isolation (VM per session) has initialization cost. The original Firecracker was a major advance in reducing VM cold start from seconds to ~125ms, but it is still orders of magnitude slower than an already-warm container. Lambda MicroVMs promises sub-second launch/resume with state preservation — which changes the calculus for multi-turn agents, where cold start cost amortizes across the session.
Control vs. Operational Burden. Operating your own EKS cluster with Kata Containers or Firecracker gives full control over kernel version, seccomp policies, cgroup limits, and per-pod networking. But that implies a dedicated platform team, patching cycles, capacity management, and an incident runbook that most product teams do not want to maintain. Lambda MicroVMs transfers that burden to AWS — with the trade-off of lock-in and reduced internal visibility.
State vs. Ephemerality. Multi-turn agents need state across steps: Python variables defined in turn 1 must be available in turn 3. Standard Lambda does not reliably preserve state across invocations (the execution environment may be recycled). ECS/EKS-based solutions can keep a container alive per session, but that implies managing that container's lifecycle, scaling, and termination. Lambda MicroVMs resolves this natively with resume semantics.
Cost vs. Security. Standard Lambda is cheaper per invocation. Lambda MicroVMs will (estimated) carry a cost overhead due to additional isolation — AWS did not publish specific pricing at launch, but the analogy with Fargate (which also uses Firecracker with stronger isolation) suggests a 20-40% premium over standard Lambda for equivalent workloads. Containers on ECS/EKS have more predictable compute cost, but human operational cost frequently exceeds the infrastructure delta.
Regional Coverage vs. Immediate Availability. At launch, Lambda MicroVMs is available in 5 regions. For platforms with data residency requirements in other regions (e.g., sa-east-1 for Brazilian customers under LGPD, ap-southeast-1 for Singapore), this is a real short-term blocker.
Decision Matrix: Options for Untrusted Code Execution
Option 1: Lambda MicroVMs (new primitive, Jun/2026)
- VM-level isolation per session — eliminates the class of cross-tenant escape risk on the same worker
- State preserved across invocations of the same session — enables multi-turn agents without external execution-context storage
- Sub-second launch/resume — cold start amortized across the agent session
- Zero virtualization operational overhead — no managing Firecracker, Kata, kernel versions, or seccomp policies
- Native integration with Bedrock AgentCore, IAM, CloudWatch, X-Ray
- Available in only 5 regions at launch — blocker for data residency requirements in other regions
- Moderate AWS primitive lock-in — future migration requires rewriting the execution runtime
- Pricing not published at launch (estimate: 20-40% premium vs. standard Lambda)
- Less internal hypervisor visibility — dependency on AWS for VM-level security patches
RECOMMENDED for multi-tenant code interpreter. Best balance of isolation, operations, and latency for multi-turn agent sessions.
Option 2: Ephemeral Containers on ECS/EKS with gVisor or Firecracker/Kata
- Full control over the isolation stack — kernel version, seccomp profiles, network policies, cgroup limits
- Multi-cloud portability — same runtime works on GKE, AKS, or on-premises
- Available in all AWS regions — no geographic restriction
- Proven isolation in production (e.g., Google Colab, Kaggle, Judge0) with gVisor/Kata
- High operational burden: manage cluster, isolation runtime, patching, capacity planning, per-session pod autoscaling
- Container + isolation runtime cold start: 2-8s for Kata/Firecracker on EKS (estimate based on public benchmarks)
- Session state requires custom solution (Redis, mounted EFS, or keeping container alive with associated cost)
- Dedicated platform team required — human cost frequently exceeds the infrastructure delta
- gVisor (ptrace mode) has 10-30% CPU overhead vs. native execution on compute-intensive workloads
Recommended only if: (a) region requirement not covered by Lambda MicroVMs, (b) multi-cloud portability is mandatory, or (c) platform team already exists and granular control is required.
Option 3: Standard Lambda (current execution)
- Lowest cost per invocation — no additional isolation overhead
- Available in all regions — no geographic restriction
- Zero operations — standard Lambda primitive, no additional configuration
- Mature ecosystem: layers, extensions, SnapStart (JVM), native integrations
- SECURITY BLOCKER: execution environment reuse across invocations — global variables, /tmp, and open connections may leak across sessions of the same tenant
- No strong isolation between distinct tenants on the same worker — unacceptable threat model for multi-tenant code interpreter
- No guaranteed state preservation across invocations — multi-turn agents require full context serialization/deserialization at each step
- Underlying Firecracker not exposed as a controllable primitive — no per-agent-session isolation guarantees
NOT RECOMMENDED for multi-tenant untrusted code execution. Acceptable only for internal tools with audited code and single tenant.
Architecture Decision
The agent platform needs to execute arbitrary code generated by AI models (Python, analysis scripts, financial simulations) in a multi-tenant environment, with multiple turns per agent session. The isolation requirement is equivalent to that of a financial system: data from one tenant must not be accessible to another tenant under any condition. The June 2026 launch of Lambda MicroVMs offers a primitive that solves all three problems simultaneously — VM isolation, preserved state, and zero operations — in the regions where the platform primarily operates (us-east-1, us-east-2, eu-west-1).
Adopt Lambda MicroVMs as the exclusive runtime for untrusted code execution (agent-generated or user-submitted), with one MicroVM per agent session, orchestrated by Bedrock AgentCore. Each session receives an isolated MicroVM that is launched at session start, preserves state across agent turns, and is terminated at session end or by inactivity timeout. External tools (financial APIs, databases) are accessed via IAM with per-session contextual authorization — the MicroVM has no direct access to long-lived credentials. For regions not covered by Lambda MicroVMs at launch, Option 2 (ECS/EKS with Kata Containers) is the fallback until regional expansion.
- POSITIVE: VM-level isolation eliminates the class of cross-tenant escape risk — the blast radius of malicious or malformed code is contained within the session's MicroVM.
- POSITIVE: State preserved across turns enables agents that iterate on results (e.g., a financial agent that refines a simulation across multiple steps without re-serializing full context).
- POSITIVE: Zero virtualization operational overhead — the product team does not maintain a cluster, isolation runtime, or kernel policies.
- NEGATIVE: Limited regional coverage at launch — regions outside the 5 available require a more operationally complex fallback (ECS/EKS with Kata).
- NEGATIVE: Moderate lock-in on the AWS Lambda MicroVMs primitive — future migration to another provider or custom solution requires rewriting the execution runtime and session model.
- NEGATIVE: Higher per-session cost than standard Lambda (estimate: 20-40% premium) — for platforms with high volume of short sessions, the cost delta must be monitored.
Connecting to the Financial World: Why VM-Level Isolation Matters in FinTech
In financial systems, the isolation model is not just a technical decision — it is a regulatory requirement. Frameworks such as SOC 2 Type II, PCI DSS, and in Brazil, BCB Resolution 4.893/2021 (IT risk management for financial institutions) require that data from distinct clients be demonstrably isolated. A multi-tenant code interpreter without strong isolation does not pass a technical controls audit.
Consider the concrete use case: an AI agent that receives portfolio data from an institutional client, generates Python code to calculate VaR (Value at Risk) with Monte Carlo, and executes that code to produce a report. The code generated by the model accesses data that is, by definition, confidential and subject to non-disclosure agreements. If that code runs on a standard Lambda with warm start, there is real risk that global variables or files in /tmp from a previous execution (from another client) are accessible.
With Lambda MicroVMs, each client session receives an isolated VM. The agent's code runs inside that VM. Even if the code attempts to access /proc, host environment variables, or make unauthorized syscalls, the Firecracker hypervisor blocks the escape. The blast radius is the session — not the worker, not the neighboring tenant.
Furthermore, state preservation has direct value in multi-step financial workflows: an agent that first loads historical data (turn 1), then calibrates model parameters (turn 2), then runs the simulation (turn 3), and finally generates the report (turn 4) can do all of this within the same MicroVM, without re-serializing gigabytes of data between each step. With standard Lambda, each turn would be a new invocation potentially on a different execution environment, requiring the agent to reload full context — additional latency and cost.
The async payload expanded to 1 MB (also launched in June 2026) complements this scenario: larger input datasets can be passed directly to the MicroVM without needing an intermediate S3 stage for small-to-medium inputs.
Architecture: Agent Code Execution with Lambda MicroVMs
Untrusted AI-agent-generated code execution flow: from user to Bedrock AgentCore, through the per-session isolated MicroVM, to external tools with contextual authorization and centralized telemetry.
- User / App · Tenant A or B
- Bedrock AgentCore · Orchestrator
- Foundation Model · (Claude / Bedrock)
- IAM · Per-session role · (STS AssumeRole)
- Secrets Manager · Tool credentials
- Lambda MicroVM · Session: Tenant A · [Firecracker VM] · State preserved
- Lambda MicroVM · Session: Tenant B · [Firecracker VM] · State preserved
- S3 · Portfolio data · (tenant-scoped prefix)
- Financial API · (market data, · risk engine)
- RDS / Aurora · Client records · (row-level security)
- CloudWatch · Logs + Metrics · Session alarms
- X-Ray · Distributed trace · per session
What This Decision Does Not Solve: Limits and Residual Risks
Being honest about what an architecture decision does not solve is part of senior work. Lambda MicroVMs solves execution isolation — but it does not solve everything.
Prompt Injection is still a vector. If the model generates code that exfiltrates data via a legitimate channel (e.g., writes portfolio data to a file and uploads it to an attacker-controlled S3 bucket via an authorized API call), VM isolation does not help. The defense here is granular contextual authorization on tools: the MicroVM has permission to write only to the tenant's S3 prefix, not to arbitrary buckets. This requires Bedrock AgentCore to implement per-session permission scoping, not just per-Lambda-function.
Denial of Wallet is a real risk. One MicroVM per session with preserved state means long or zombie sessions accumulate cost. A misconfigured agent (or a malicious user forcing long sessions) can generate unexpected cost. Mitigation: hard session timeout (e.g., 30 minutes of inactivity), CloudWatch alarm on sessions exceeding expected P99 duration, and circuit breaker in AgentCore for sessions with anomalous token consumption.
Hypervisor visibility is limited. You do not have Firecracker-level access to inspect what happens inside the VM beyond the logs the code produces. In a security incident, forensics is limited to what CloudWatch and X-Ray capture. For platforms with detailed forensic audit requirements, this may be insufficient — in that case, Option 2 (EKS with Kata) provides more visibility via eBPF/Falco.
Regional coverage is a roadmap risk. AWS may expand Lambda MicroVMs to other regions quickly, or it may take quarters. Platforms with customers in uncovered regions need to keep the ECS/EKS fallback operational — meaning the operational burden reduction promised by Option 1 is partial until full regional expansion.
The pricing model is not yet published. All cost analysis here is an estimate. Before committing high-volume workloads to Lambda MicroVMs, it is necessary to validate actual pricing with AWS and build a cost model based on sessions/day, average duration, and compute per session.
AWS Well-Architected Lens
Security
VM-level isolation per session eliminates the class of cross-tenant escape risk. IAM with STS AssumeRole per session enforces least privilege. Contextual authorization on tools (tenant S3 prefix, database RLS) closes the lateral exfiltration vector via legitimate channel. Dependency on AWS hypervisor patches is acceptable given Firecracker's track record.
Reliability
Limited regional coverage (5 regions at launch) requires ECS/EKS fallback for other regions. Session timeout and explicit MicroVM termination are necessary to avoid zombie sessions. Sub-second resume model reduces timeout risk in multi-turn agents.
Performance efficiency
Cold start amortized across the session — MicroVM launch cost is paid once per session, not per turn. Sub-second resume between turns is comparable to standard Lambda warm start. For sessions with many short turns, the model is more efficient than re-initializing context on each invocation.
Sustainability
Ephemeral per-session MicroVMs avoid compute waste from idle containers. Explicit termination at session end is necessary to avoid cost and energy consumption from zombie sessions.
Lambda MicroVMs is the primitive I have been waiting for years to solve the multi-tenant code interpreter problem without operating your own virtualization. Firecracker was always there, but as an internal implementation detail of Lambda — not as a controllable surface. Now it is a first-class primitive, and that changes the calculus for any platform that needs to execute untrusted code. What convinces me to recommend without hesitation for the described use case: VM-level isolation is not a 'nice to have' in multi-tenant with financial data — it is the minimum acceptable floor. And state preservation is the real differentiator for multi-turn agents: the alternative of serializing/deserializing full context between each turn is not just slower, it is architecturally wrong — you are paying to move data that should be in memory. What makes me recommend with caveats: pricing is not published, regional coverage is limited, and lock-in is real. My practical stance: adopt Lambda MicroVMs for covered regions, keep the ECS/EKS fallback operational for the rest, and revisit the decision in 6 months when pricing is clear and regional expansion is more advanced. One detail I see underestimated: contextual authorization on tools is as important as VM isolation. An isolated MicroVM with overly broad IAM permissions still allows exfiltration via legitimate channel. VM isolation contains the escape — but per-session permission scoping contains tool abuse. You need both. Bedrock AgentCore must be configured to issue temporary roles with minimum scope per session, not Lambda function roles with broad access. For FinTech teams: this architecture passes the threat model of a SOC 2 audit for customer data isolation in code execution — provided you document the contextual authorization model and session timeout. That is what will appear on the auditor's questionnaire.
Verdict
Lambda MicroVMs is the correct choice for untrusted code execution in multi-tenant agent platforms, in the regions where it is available. The decision is not about AI hype — it is about a concrete security engineering problem (tenant isolation in arbitrary code execution) that now has a first-class serverless solution, without the operational cost of managing your own virtualization. The cost trade-off (estimate: 20-40% premium vs. standard Lambda) is justified by VM-level isolation and state preservation — especially in financial contexts where the cost of a cross-tenant data leak incident exceeds any compute savings by orders of magnitude. Limited regional coverage at launch is the only real blocker, and requires an operational ECS/EKS fallback for uncovered regions. The broader architectural lesson: when a new platform primitive simultaneously solves a security problem, a state problem, and an operational problem — without introducing additional complexity — adoption is the rational decision. Lambda MicroVMs does exactly that for the multi-tenant code interpreter problem. The caveat is to validate actual pricing before committing high-volume workloads, and never treat VM isolation as a substitute for granular contextual authorization on the tools the agent can invoke.
Post-mortems, ADRs and architecture deep dives in your inbox — the way an architect reads them.
No spam · unsubscribe anytime
Ask Fernando about this
Get a focused answer about this study from my AI assistant, grounded in my work.
Join the conversation
Sign in to comment
Verify your email to join in — you'll also get the newsletter. No password.