# Design Doc: Production Agents with Amazon Bedrock AgentCore

This document describes the architecture of a production AI agent platform using Amazon Bedrock AgentCore, covering serverless runtime with session isolation, tool governance via Gateway/MCP, short- and long-term memory, federated identity, and observability. The focus is on operational guardrails, cost ceilings, and tool governance — the problems that actually sink agent projects in enterprise environments.

- URL: https://fernando.moretes.com/studies/design-doc-bedrock-agentcore-agentes-em-producao

- Markdown: https://fernando.moretes.com/studies/design-doc-bedrock-agentcore-agentes-em-producao/study.md?lang=en

- Type: Design Doc / RFC

- Company: Plataforma de agentes (cenário)

- Domain: IA / Agentes

- Date: 2026-02-22

- Tags: bedrock, agentcore, ai-agents, aws, serverless, observability, guardrails, mcp

- Reading time: 13 min

---

AI agents in production don't fail because of model capability — they fail due to absent governance, uncontrolled cost, and poorly isolated tools. This RFC proposes a reference architecture using Amazon Bedrock AgentCore that treats these three vectors as first-class citizens, not afterthoughts.

## The Real Problem with Production Agents

Most AI agent projects start the same way: an impressive prototype in a notebook, some ReAct loops working locally, and a demo that convinces stakeholders. The problem surfaces when that agent needs to run in production with multiple simultaneous users, access to real tools (internal APIs, databases, payment systems), and a CFO who wants to know what it will cost at the end of the month.

The technical problems that most frequently sink agent projects aren't about the model itself. They're about **session isolation** — two users whose memory contexts bleed into each other is a security incident, not a minor bug. They're about **tool call explosion** — an agent in a loop can make hundreds of calls to an external API before anyone notices, generating costs and potentially violating partner rate limits. They're about **auditability** — when an agent makes a wrong financial decision, you need to reconstruct exactly which reasoning sequence and which data led to that action.

Amazon Bedrock AgentCore, launched in 2025, is AWS's answer to this specific set of problems. It is not an agent framework in the sense of LangChain or CrewAI — it is an infrastructure layer that solves the operational problems those frameworks leave for you to solve. The Runtime offers serverless execution with session isolation by design. The Gateway exposes tools as MCP (Model Context Protocol) endpoints with access control. The Memory store separates short-term memory (within a session) from long-term memory (persisted across sessions). The Identity module integrates with IAM and OIDC providers to ensure the agent acts with the correct user's permissions, not a dangerous shared identity. And native Observability sends structured traces to CloudWatch without manual instrumentation.

This document proposes how to compose these five components into an enterprise agent platform, with explicit guardrails, cost ceilings enforced via quotas and circuit breakers, and tool governance that allows the security team to audit what each agent can do without reading code.

## Goals and Non-Goals

- ✅ GOAL: Define a reference architecture for production AI agents using Bedrock AgentCore with session isolation guaranteed by design
- ✅ GOAL: Establish tool governance — which agents can call which APIs, with which permissions, auditable without reading code
- ✅ GOAL: Implement operational cost ceiling via token quotas, tool call circuit breakers, and budget alerts at the session level
- ✅ GOAL: Ensure complete auditability — every reasoning step, every tool call, every memory access must be traceable in CloudWatch
- ✅ GOAL: Integrate federated identity so the agent operates with end-user permissions, not a shared role
- ❌ NON-GOAL: Define the internal design of the language model or fine-tuning strategies

## Fact Sheet

- **Base platform:** Amazon Bedrock AgentCore (GA 2025)
- **Core components:** Runtime, Gateway, Memory, Identity, Observability
- **Execution model:** Serverless with session isolation via ephemeral container
- **Tool protocol:** MCP (Model Context Protocol) via AgentCore Gateway
- **Identity integration:** IAM + OIDC federation via AgentCore Identity
- **Observability:** Native CloudWatch Logs + Traces, no additional SDK
- **Guardrails:** Amazon Bedrock Guardrails integrated with Runtime
- **Reference scenario:** Multi-tenant enterprise platform, financial/operational domain
- **Target regions (estimate):** us-east-1 primary, us-west-2 DR — subject to AgentCore availability

## Proposed Design: The Five Planes of the Architecture

The architecture I propose organizes AgentCore components into five distinct functional planes, each with clear responsibilities and well-defined interfaces. This separation is not merely conceptual — it has direct implications for how you apply security policies, how you scale each component independently, and how you debug when things go wrong.

**Plane 1 — Execution (AgentCore Runtime):** The Runtime is where the agent loop runs. Each user session receives an isolated execution environment — think of it as an ephemeral microVM that is created when the session starts and destroyed when it ends. There is no shared state between sessions at the Runtime level. This solves the context leakage problem by design, without relying on developer discipline. The Runtime accepts agent code as a Docker container, meaning you can use any orchestration framework (LangGraph, custom ReAct loop, etc.) inside it. AgentCore doesn't lock you into a specific orchestration model — it manages lifecycle, isolation, and integration with the other planes.

**Plane 2 — Tools (AgentCore Gateway):** The Gateway exposes tools to the agent as MCP endpoints. The decision to use MCP as the protocol is significant: MCP is an open standard that allows tools to be described in a way the model can understand their capabilities, parameters, and constraints. In the enterprise context, this means you can have a tool catalog approved by the security team, and the agent can only invoke tools in that catalog. The Gateway applies authentication (via Identity) and authorization (via IAM policies) on every tool call. You configure rate limits per tool and per agent — this is the primary circuit breaker mechanism against call explosion.

**Plane 3 — Memory (AgentCore Memory):** The Memory store has two levels. Short-term memory is the current session context — message history, intermediate results, reasoning state. It is automatically made available to the session's Runtime and destroyed when the session ends. Long-term memory is persisted across sessions — user preferences, learned domain facts, summaries of previous interactions. It is indexed by `userId` and `agentId`, ensuring one user's memory is never accessible to another. The decision about what persists to long-term memory must be explicit in the agent code — not automatic. Automatically persisting everything is a privacy risk and an unnecessary cost.

**Plane 4 — Identity (AgentCore Identity):** This is the most frequently underestimated component. In many agent implementations I've seen, the agent runs with a broad IAM role and the end-user identity is just a parameter in the prompt. This is fundamentally wrong for any system accessing sensitive data. AgentCore Identity allows you to federate the end-user's identity (via OIDC) so the agent receives temporary credentials with that specific user's permissions. When the agent calls a tool via Gateway, the call carries the user's identity, not the platform's. This means access control happens in downstream tools, not just in the agent.

**Plane 5 — Observability (AgentCore Observability):** The Runtime automatically emits structured traces to CloudWatch that include every step of the agent loop: what the input was, what the model's reasoning was, which tool was called, what the result was, how long it took, how many tokens were consumed. These traces are the foundation for auditing, debugging, and cost analysis. I complement this with custom metrics for cost per session (tokens × model price) and alerts when a session exceeds the defined ceiling.

## Reference Architecture: Production Agent with AgentCore

Complete flow of an agent request: from client to model, through identity, isolated runtime, tools governed via MCP Gateway, two-level memory, and centralized observability.

### 👤 Client Layer

- End User (browser/app) (user)
- Identity Provider OIDC/Cognito (security)

### 🔐 Identity & Auth

- AgentCore Identity OIDC federation + STS (security)
- IAM Scoped credentials (security)

### ⚡ AgentCore Runtime

- AgentCore Runtime Serverless / session-isolated (compute)
- Bedrock Guardrails Content + topic filters (security)
- Bedrock Model Claude / Titan / etc. (ai)

### 🧰 Tool Governance

- AgentCore Gateway MCP endpoints + rate limits (network)
- Internal APIs (ERP, CRM, DB) (external)
- External APIs (partners, SaaS) (external)

### 🧠 Memory

- Short-term Memory Session context (ephemeral) (data)
- Long-term Memory User facts (persistent) (storage)

### 📊 Observability & Cost Control

- CloudWatch Traces Agent steps + tool calls (data)
- CloudWatch Metrics Tokens + cost/session (data)
- Budget Alarm Cost ceiling enforcement (messaging)

### Flows

- user -> idp: OIDC auth
- idp -> agentcore-identity: identity token
- agentcore-identity -> iam: assume scoped role
- iam -> runtime: temp credentials
- user -> runtime: agent request
- runtime -> guardrails: input/output filter
- runtime -> model: invoke model
- model -> runtime: reasoning + tool calls
- runtime -> gateway: tool invocation (MCP)
- gateway -> tool-internal: authenticated + rate-limited
- gateway -> tool-external: authenticated + rate-limited
- runtime -> mem-short: read/write context
- runtime -> mem-long: persist facts (explicit)
- runtime -> cw-traces: automatic traces
- runtime -> cw-metrics: tokens + latency
- cw-metrics -> budget-alert: cost threshold

## Guardrails, Cost Ceiling, and Tool Governance in Detail

These three mechanisms deserve detailed treatment because this is where most implementations remain superficial — and where incidents happen.

**Guardrails:** Amazon Bedrock Guardrails is configured at the Runtime level and applied at two points: before input reaches the model (prompt injection filtering, PII, prohibited topics) and before output is returned to the user (harmful content filtering, sensitive data). For an enterprise agent, I configure at least four categories: (1) topic filter — the financial support agent should not respond to medical questions, for example; (2) PII filter — card numbers, SSNs, and passwords should never appear in outputs; (3) prompt injection filter — jailbreak attempts via user input are blocked before reaching the model; (4) grounding filter — for RAG agents, Guardrails can verify that the output is grounded in the retrieved documents. Important: Guardrails adds latency. In my tests with Claude 3 Sonnet, the overhead was 80-150ms per call. For interactive use cases, this is acceptable. For batch processing pipelines, you may want to apply guardrails only to the final output.

**Cost Ceiling:** The cost of a production agent has three components: input tokens (grows with context size and injected memory), output tokens (grows with long responses and verbose reasoning), and tool calls (each call may have its own cost if the tool is paid). My cost ceiling strategy operates at three levels: (1) **Session level** — each session has a token budget configured in the Runtime. When the budget is reached, the Runtime returns a structured error that the agent must handle gracefully, not a silent crash. (2) **Tool level** — the Gateway configures rate limits per tool per agent: maximum N calls per session and maximum M calls per minute. This prevents infinite tool loops. (3) **Account level** — AWS Budgets with alerts at 80% and 100% of the monthly budget, with automatic throttling action via Service Quotas if the limit is reached. The tool circuit breaker in the Gateway is the most critical mechanism. An agent in a loop without a circuit breaker can generate hundreds of dollars in API calls in minutes.

**Tool Governance:** The tool catalog in the Gateway is the interface between the engineering team and the security team. Each registered tool has: (1) an MCP description the model uses to decide when to invoke it; (2) a parameter schema the Gateway validates before executing; (3) an IAM policy defining which agents can invoke that tool; (4) rate limits and maximum timeout; (5) an audit flag determining whether each invocation is logged with full parameters (for high-risk tools) or only with metadata (for low-risk tools). This structure allows the security team to periodically review the catalog without needing to understand the agent code. They see: 'agent X can call tool Y with these parameters, with this rate limit, and each call is audited'. That is governance that scales.

## Architectural Alternatives Considered

### AgentCore Runtime (proposed)

**Pros**
- Session isolation by design, no additional code
- Native integration with Memory, Identity, Gateway, and Observability
- Managed service — no container infrastructure operation

**Cons**
- Limited regional availability at launch (check roadmap)
- Vendor lock-in to AgentCore interface contracts
- Less flexibility for low-level runtime customization

**Verdict:** Right choice for teams that want to focus on business logic, not infrastructure plumbing

### ECS/Fargate + Lambda (self-managed)

**Pros**
- Full control over runtime and infrastructure
- No dependency on AgentCore-specific APIs
- Portability to other clouds or on-premises

**Cons**
- Session isolation must be implemented manually — source of security bugs
- Structured agent observability requires significant custom instrumentation
- Federated identity integration is complex to implement correctly

**Verdict:** Valid for teams with infrastructure expertise and portability requirements; high engineering cost

### Bedrock Agents (legacy/managed)

**Pros**
- More mature, with more examples and documentation
- Simple integration with Knowledge Bases and Action Groups

**Cons**
- Less control over the reasoning loop — black box for orchestration
- AgentCore is AWS's forward path — Bedrock Agents tends to be superseded
- Does not support external orchestration frameworks inside the runtime

**Verdict:** Suitable for simple cases; for complex enterprise production, AgentCore offers more control

### Pure open-source framework (LangGraph + own infra)

**Pros**
- Maximum flexibility and portability
- No managed service costs beyond compute
- Rich integration ecosystem and active community

**Cons**
- All operational problems (isolation, memory, identity, observability) are the team's responsibility
- Significantly longer time to production
- Risk of incorrectly implementing critical security mechanisms

**Verdict:** Correct for product agent platforms (where differentiation is in the infrastructure); wrong for most enterprise use cases

## Decision: Explicit vs. Automatic Persistence in Long-term Memory

**Status:** accepted

**Context**

AgentCore Memory can be configured to automatically persist all session context to long-term memory, or to persist only what the agent explicitly decides to save. The automatic option is simpler to implement; the explicit one requires additional logic in the agent.

**Decision**

Adopt explicit persistence. The agent must have a 'memory consolidation' step at the end of each session where it decides which facts are relevant to persist. This is implemented as an additional model call with a specific fact-extraction prompt.

**Consequences**
- Positive: Precise control over what is stored — reduces privacy risk and storage cost
- Positive: Long-term memory contains curated facts, not conversational noise — improves retrieval quality
- Negative: Adds one model call per session for consolidation — additional cost and latency at session close
- Negative: Requires well-designed consolidation prompt — prompt failures lead to under-persistence of important facts

## Phased Rollout Plan

1. **Phase 0 — Foundation (Weeks 1-2)** — Set up AWS account with IAM guardrails. Create IaC structure (Terraform/CDK) for all AgentCore resources. Define initial tool catalog with security team. Configure CloudWatch Log Groups with retention and KMS encryption. Establish CI/CD pipeline for agent container builds. No agents in production in this phase.

2. **Phase 1 — Pilot Agent with Read-Only Tools (Weeks 3-5)** — Deploy first agent with access only to read-only tools (queries, reports). Configure AgentCore Identity with OIDC. Validate session isolation with concurrency tests (10+ simultaneous sessions). Configure budget alerts. Collect baseline cost and latency metrics. Security review of tool catalog.

3. **Phase 2 — Write Tools with Human Approval (Weeks 6-8)** — Add write tools (record creation, updates) with mandatory human-in-the-loop for high-impact actions. Implement explicit memory consolidation. Configure granular rate limits per tool. Test cost circuit breakers with high-volume synthetic sessions. Audit review of CloudWatch traces.

4. **Phase 3 — Full Production and Expansion (Weeks 9-12)** — Remove human-in-the-loop restrictions for approved low-risk actions. Scale to multiple specialized agents. Implement DR in second region. Establish quarterly tool catalog review process with security team. Document incident runbooks for explosive cost and session leakage scenarios.

> **Critical Risks and Mitigations:** **RISK 1 — Non-Converging Tool Loop:** An agent can enter a loop calling tools repeatedly without progressing toward the goal. Mitigation: circuit breaker in Gateway (max N calls per session per tool) + maximum session timeout in Runtime. Without these two protections, a single malicious user or a prompt bug can generate significant costs.

**RISK 2 — Context Leakage Between Sessions:** If Runtime isolation fails (platform bug, misconfiguration), memory from one session can contaminate another. Mitigation: validate isolation with concurrency tests on every deploy + alerts for cross-userId memory access in memory logs.

**RISK 3 — Silent Shared Identity:** If OIDC federation fails or is not configured correctly, the agent falls back to the platform IAM role. This may not be noticed immediately but represents an access control violation. Mitigation: fail explicitly if the user identity token is not present — never use the platform role as fallback for data operations.

**RISK 4 — AgentCore Regional Availability:** AgentCore is a new service. Regional availability may be limited and SLAs are still being established. Mitigation: architect with fallback to ECS/Lambda execution if Runtime is not available in the required region — but accept that the fallback will not have the same isolation guarantees.

**RISK 5 — Long-term Memory Cost:** Persisted memory grows indefinitely without TTL and cleanup policies. Mitigation: define TTL per memory category + periodic compaction job that uses the model to consolidate old memories into smaller summaries.

## Well-Architected Assessment

- **security**: Per-session federated identity via AgentCore Identity + IAM scoped roles eliminates the shared identity anti-pattern. Guardrails applied on input and output. Audited tools with granular IAM policies. KMS encryption for long-term memory and logs. The residual risk is dependence on OIDC federation being correctly configured — silent failure here is the most serious risk vector.
- **reliability**: Serverless runtime with automatic scaling. Tool circuit breakers prevent cascading failures. Session timeout ensures stuck sessions don't consume resources indefinitely. DR in second region for business continuity — but accept that RTO may be in minutes given that session state is ephemeral by design.
- **performance**: Guardrails latency (80-150ms) is the main overhead added by the platform. Short-term memory is in-process — no network latency for session context. Long-term memory retrieval adds ~100-300ms depending on index size. For interactive cases, the dominant bottleneck remains model latency, not infrastructure.
- **cost**: Three cost vectors: model tokens (dominant), external tool calls, and long-term memory storage. Budget alerts at 80%/100% with automatic throttling. Tool circuit breakers as primary protection against cost explosion. Long-term memory with TTL to prevent indefinite growth.
- **sustainability**: Serverless eliminates idle resources — compute only during session execution. Long-term memory with periodic compaction reduces storage. Smaller model selection for simple tasks (complexity-based routing) reduces energy consumption per token.

> **My Senior Perspective: What Really Matters Here:** After 16 years building financial and data systems in production, the thing that worries me most in agent projects isn't model quality — it's the absence of a 'what happens when this goes wrong' mindset. Agents are non-deterministic systems operating in deterministic environments (APIs, databases, payment systems). That tension is where incidents are born.

AgentCore correctly solves the platform problems I've seen teams spend months implementing poorly: session isolation, federated identity, structured observability. That is genuinely valuable. But it doesn't solve the agent design problem itself — and that's where I'd invest more time.

Specifically: **the tool space design is the most important decision in an agent**. Tools that are too broad (a 'execute_sql' tool that accepts any query) create enormous attack surface. Tools that are too granular (one tool for every possible operation) create an action space the model can't navigate efficiently. The sweet spot is tools with clear business semantics — 'create_payment_request', not 'insert_into_payments_table'. This limits what the agent can do by design, not by policy.

On the cost ceiling: I'd treat the token budget per session as an SLO, not a convenience configuration. Define it based on use case analysis: what is the most complex task this agent should be able to complete? Calculate the token consumption for that task. Set the ceiling at 2-3x that value. Any session that exceeds that ceiling is probably in a loop or being abused — it's not a legitimate user with a legitimate task.

Finally: AgentCore is new. I would not put a mission-critical system into production on it day zero without an architectural fallback. But for teams building new agent systems, it represents the right approach — managed infrastructure for operational problems, freedom for business logic. It's worth the investment to learn its interface.

## AgentCore Components vs. Self-Managed Implementation
| Criterion | Capability | AgentCore (managed) | Self-managed (ECS+Lambda) | Self-managed implementation effort (estimate) |
| --- | --- | --- | --- | --- |
| Session isolation | By design, no code | Manual implementation required | 2-4 weeks (high bug risk) | — |
| Federated identity | AgentCore Identity + native OIDC | Manual OIDC + STS integration | 3-6 weeks (security complexity) | — |
| Agent observability | Automatic traces to CloudWatch | Custom instrumentation SDK | 4-8 weeks (ongoing maintenance) | — |
| Tool governance | MCP Gateway with IAM policies | API Gateway + custom Lambda authorizer | 2-3 weeks | — |
| Multi-level memory | Native short/long-term with userId index | Redis (short) + DynamoDB/OpenSearch (long) | 3-5 weeks | — |

## Success Metrics and Targets

- **Session isolation:** 0 context leakage incidents between sessions in concurrency tests (10+ parallel sessions)
- **Audit coverage:** 100% of high-risk tool calls with complete trace in CloudWatch
- **Cost ceiling per session:** 0 sessions exceeding 3x the defined budget per task type (circuit breaker must intervene before)
- **Session P95 latency:** < 30s for medium-complexity tasks (estimate based on public Claude 3 Sonnet benchmarks + AgentCore overhead)
- **Agent availability:** ≥ 99.5% (bounded by Bedrock Runtime SLA — check current AWS SLA)
- **Tool catalog review time:** < 1 day for low-risk new tool approval by security team
- **Guardrails coverage:** 100% of inputs and outputs passing through configured filters — no bypass by design

## Verdict

Amazon Bedrock AgentCore represents a genuine shift in the maturity of agent infrastructure on AWS. It is not an agent framework — it is the platform layer that should have existed from the start: session isolation by design, federated identity, tool governance via MCP, and structured observability without manual instrumentation. For teams building enterprise agents, it eliminates weeks of plumbing work that would have been implemented less securely anyway.

But AgentCore solves infrastructure problems, not design problems. The quality of a production agent still fundamentally depends on: (1) a well-designed tool space with clear business semantics; (2) system prompts that define behavior and limits with precision; (3) a memory strategy that distinguishes what should persist from what is session noise; and (4) adversarial testing that actively tries to break the agent before real users do.

My recommendation: adopt AgentCore for new enterprise agent projects. Invest the time saved on infrastructure in careful tool space design and guardrails strategy. Treat the token budget per session as an operational SLO. And maintain a documented architectural fallback while the service matures — not out of distrust, but out of engineering discipline.

## References

- [Amazon Bedrock AgentCore — AWS Product Page](https://aws.amazon.com/bedrock/agentcore/)
- [What is Amazon Bedrock AgentCore — AWS Developer Guide](https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/what-is-bedrock-agentcore.html)

## Case sources

- [Amazon Bedrock AgentCore — AWS](https://aws.amazon.com/bedrock/agentcore/)
- [AWS docs — What is Bedrock AgentCore](https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/what-is-bedrock-agentcore.html)