# Inside AI Agents (3/3): Agents in Production on AWS with Bedrock AgentCore

The third and final part of the series descends the elevator to the technical floor: how to run an AI agent in production on AWS using Amazon Bedrock AgentCore. We cover the full component map (Runtime, Gateway, Memory, Identity, Observability), model selection by cost/latency/reasoning, security with guardrails and session isolation, and FinOps to keep real costs under a few dollars per month.

- URL: https://fernando.moretes.com/studies/agentes-de-ia-por-dentro-3-em-producao-na-aws

- Markdown: https://fernando.moretes.com/studies/agentes-de-ia-por-dentro-3-em-producao-na-aws/study.md?lang=en

- Type: Guide / Deep Dive

- Domain: IA / Agentes

- Date: 2026-06-26

- Tags: bedrock, agentcore, aws, ai-agents, serverless, finops, mcp, observability

- Reading time: 10 min

---

In parts 1 and 2 of this series you learned what an AI agent is and how it reasons in loops. Now it is time to put that into production — with real code, real infrastructure, and a real AWS bill. Amazon Bedrock AgentCore is the set of building blocks AWS released in 2025 to solve exactly the problems that surface when you leave the notebook and go to production: isolation between user sessions, secure credentials for external tools, memory that persists across conversations, traceability of every agent decision, and controlled cost on a pay-per-use model. If you come from microservices or data platforms, you will recognize the patterns — just applied to an agent runtime. This document is the technical floor of the elevator: we leave 'why agents' and enter 'how to run this seriously'.

## What You Will Learn

- The full Amazon Bedrock AgentCore component map: Runtime, Gateway (MCP), Memory, Identity, Observability, Browser, and Code Interpreter
- When to use Bedrock Agents vs AgentCore vs frameworks like Strands or LangGraph — and how they compose
- How Web Search / MCP keeps the agent current with recent facts (and why this is the exact mechanism that keeps this site updated)
- Model selection in Bedrock: reasoning × cost × latency and the impact on real TCO
- Serverless event-driven deployment, security guardrails, session isolation, and FinOps to keep costs under a few dollars per month

## Quick Glossary — Terms That Appear in This Lesson

- **AgentCore:** Set of AWS managed services (launched 2025) providing the infrastructure building blocks to run agents in production: runtime, memory, identity, observability, and tools.
- **Runtime (AgentCore Runtime):** Serverless execution environment that runs the agent loop. Each user session is isolated — think of an ephemeral container that is born, executes, and dies per session.
- **MCP (Model Context Protocol):** Open protocol (Anthropic, 2024) that standardizes how a language model calls external tools. It is the 'USB-C of agents': a universal connector between the LLM and any tool.
- **Gateway (AgentCore Gateway):** Component that exposes tools (APIs, Lambda functions, external services) to the agent via MCP. It is the 'API Gateway' of the agent world.
- **Guardrails:** Configurable filters in Bedrock that block unwanted content (forbidden topics, PII, hallucinations) before the response reaches the user. It is the 'WAF of the LLM'.
- **TCO (Total Cost of Ownership):** Total cost of keeping a system running: not just the LLM token bill, but also compute, storage, tool calls, and operations.
- **Strands Agents:** AWS open-source framework for building agents in Python, with native integration to Bedrock AgentCore. Think of it as the 'CDK of agents': a high-level abstraction over the infrastructure.
- **LangGraph:** LangChain framework for orchestrating agents as state graphs. More flexible and more complex than Strands; recommended when the reasoning flow is highly customized.

## The AgentCore Map: Six Pieces That Solve Six Real Problems

When you put an agent into production for the first time, six problems appear almost simultaneously. First: **where does the loop run?** You do not want to manage servers for something that may be idle 99% of the time. Second: **how does the agent call external tools in a standardized way?** Each API has its own schema; without a common protocol, you write glue code forever. Third: **how does the agent remember what happened?** LLMs are stateless by nature — each call starts from zero. Fourth: **how does the agent obtain credentials for tools without you putting secrets in the prompt?** Fifth: **how do you know what the agent decided and why?** Without traceability, debugging an agent is like debugging a program without logs. Sixth: **how does the agent access information that was not in its training data?**

AgentCore solves each of these problems with a dedicated component: **Runtime** (where the loop runs, serverless, isolated per session), **Gateway** (MCP protocol for tools), **Memory** (managed short and long term), **Identity** (tool credentials without secrets in code), **Observability** (traces, metrics, and evaluation), and the managed tools **Browser** and **Code Interpreter** (for browsing the web and executing code safely). Each piece can be used independently — you do not need to adopt everything at once.

## Diagram 1 — Amazon Bedrock AgentCore Capability Map

Static view of AgentCore components and how they relate. Read left (user) to right (external tools), passing through the execution core.

### 👤 Client Layer

- User / App client (user)
- Strands / LangGraph or custom SDK (frontend)

### ⚙️ AgentCore Runtime

- AgentCore Runtime serverless, isolated session (compute)
- AgentCore Memory short-term + long-term (data)
- AgentCore Identity tool credentials / OAuth (security)

### 🤖 Model Layer (Bedrock)

- Bedrock Model Claude / Nova / Llama (ai)
- Bedrock Guardrails content + PII filter (security)

### 🔌 Gateway & Tools

- AgentCore Gateway MCP server / tool registry (network)
- Managed Browser web navigation tool (compute)
- Code Interpreter sandboxed execution (compute)
- Web Search grounding / freshness (external)

### 📊 Observability

- AgentCore Observability traces + evals (data)
- CloudWatch metrics / logs (data)

### 🌐 External Systems

- External APIs CRM, ERP, DBs (external)
- Lambda Functions custom tools (compute)

### Flows

- user -> sdk: invokes
- sdk -> runtime: session
- runtime -> llm: prompt + context
- llm -> guardrails: filtered response
- guardrails -> runtime: safe response
- runtime -> memory: read/write state
- runtime -> identity: requests credential
- runtime -> gateway: tool call (MCP)
- gateway -> browser: browse web
- gateway -> codeint: executes code
- gateway -> websearch: current search
- gateway -> lambda: tool handler
- lambda -> extapi: integrates system
- runtime -> obs: traces / spans
- obs -> cw: metrics

## Bedrock Agents vs AgentCore vs Frameworks: When to Use Each

This is the question that most confuses people arriving at the AWS agent ecosystem. There are three layers and they **compose**, they do not exclude each other.

**Bedrock Agents** is the high-level, fully managed service with a console UI. You define a system prompt, associate a model, connect knowledge bases (RAG) and action groups (tools). It is the fastest path to a working agent — think of it as the 'Amplify of agents': productive for common cases, less flexible for advanced ones.

**AgentCore** is the infrastructure layer. It does not orchestrate the agent — it provides the blocks any orchestrator needs: isolated runtime, managed memory, tool gateway, identity, and observability. You use AgentCore when you need fine control over the reasoning loop or when you are building a multi-agent system.

**Strands Agents** is an AWS open-source Python framework that uses AgentCore as its backend. It is the code abstraction over the infrastructure — you write `@tool` in Python and Strands handles the ReAct loop, the Bedrock call, and the AgentCore integration. **LangGraph** is a more flexible and more complex alternative, ideal when the agent state graph is non-linear or when you need explicit control over each state transition.

The practical rule: start with Bedrock Agents for proofs of concept. Migrate to Strands + AgentCore when you need session isolation, persistent memory, or custom tools. Use LangGraph when the reasoning flow is a complex graph that Strands cannot express.

## Bedrock Agents × AgentCore + Strands × LangGraph — When to Use Each
| Criterion | Criterion | Bedrock Agents | AgentCore + Strands | AgentCore + LangGraph |
| --- | --- | --- | --- | --- |
| Learning curve | Low — console UI | Medium — Python SDK | High — state graphs | — |
| Loop control | Managed by AWS | Controlled via decorators | Full control via graph | — |
| Session isolation | Basic | Native (AgentCore Runtime) | Native (AgentCore Runtime) | — |
| Multi-agent | Limited | Supported | Native (agent graph) | — |
| Tools via MCP | Partial (action groups) | Native (AgentCore Gateway) | Native (AgentCore Gateway) | — |
| Best for | PoC, simple RAG, chatbots | Production, custom tools | Complex flows, advanced orchestration | — |

## Web Search and MCP: How the Agent Knows What Happened Today

Every language model has a **training cutoff date** — a point in time after which it does not know what happened. It is like hiring a brilliant consultant who spent the last 18 months on an internet-free retreat: they know everything they learned before, but not what came out yesterday. For a production agent, this is a serious problem: prices, regulations, news, API documentation — everything changes.

The solution is **search-based grounding**: instead of relying solely on the model's knowledge, the agent uses a search tool to retrieve current information before responding. **AgentCore Web Search** exposes this capability as an MCP tool — the agent decides when to search, formulates the query, receives the results, and incorporates them into the context before generating the final response. It is exactly the same mechanism a human uses when opening Google in the middle of a conversation.

This site — the architecture portfolio you are reading right now — uses this exact mechanism to stay current. When an agent generates or revises an architecture document, it uses Web Search via MCP to check for new services, price changes, or documentation updates that would invalidate the content. It is a **real meta-example**: the mechanism we are describing is the mechanism that keeps this document relevant.

From a security perspective, Web Search via AgentCore has an advantage over calling search APIs directly: the search API credentials live in **AgentCore Identity**, not in the agent's code. The runtime injects the token at call time — the model never sees the key.

## Diagram 2 — One Agent Session Flow: Request → Runtime → Tools → Response

Temporal flow of a single session. Read top to bottom. Dashed arrows are observability spans captured in parallel.

### 👤 User

- User Request 'What is the current Selic rate?' (user)

### ⚙️ AgentCore Runtime (isolated session)

- Session Init load short-term memory (compute)
- ReAct Step 1 Thought: need current data (ai)
- Tool Decision Action: web_search(query) (ai)
- ReAct Step 2 Observation: search result (ai)
- Final Answer write to memory + respond (compute)

### 🤖 Bedrock Model

- Claude 3.5 Sonnet reasoning + tool_use (ai)
- Guardrails PII + topic filter (security)

### 🔌 Tools via AgentCore Gateway (MCP)

- Web Search Tool AgentCore Web Search (external)
- Memory Store short + long term (data)

### 📊 Observability (parallel)

- Trace Span session_id + step + latency (data)
- Eval Hook groundedness score (data)

### Flows

- u1 -> r1: starts session
- r1 -> mem1: loads context
- r1 -> r2: prompt assembled
- r2 -> llm1: invokes model
- llm1 -> r3: tool_use: web_search
- r3 -> ws1: MCP call
- ws1 -> r4: current result
- r4 -> llm1: observation in prompt
- llm1 -> gr1: final response
- gr1 -> r5: filtered response
- r5 -> mem1: persists summary
- r5 -> u1: delivers response
- r2 -> obs1: span step 1
- r4 -> obs1: span step 2
- r5 -> eval1: evaluates groundedness

## Model Selection, FinOps, and the Real Cost of an Agent in Production

Model choice is the highest-impact decision on an agent's TCO. In Bedrock, you have a spectrum ranging from light, cheap models (Amazon Nova Micro, ~$0.035/1M input tokens) to heavy reasoning models (Claude 3.7 Sonnet with extended thinking, ~$3/1M input tokens). The cost difference is two orders of magnitude — and in an agent with multiple ReAct steps, you pay for each model call.

The rule I use in practice: **use the cheapest model that solves the problem**. For classification, structured data extraction, and simple responses, Nova Micro or Haiku are sufficient. For multi-step reasoning with tools, Claude 3.5 Sonnet is the optimal cost/quality point. For problems requiring deep reasoning (complex financial planning, legal analysis), Claude 3.7 with extended thinking is worth the extra cost.

In practice, a customer support agent with 1,000 sessions/month, an average of 5 ReAct steps per session, and 2,000 tokens of context per step stays below **$5/month** using Haiku — and below **$50/month** using Sonnet. Idle cost is zero because AgentCore Runtime is serverless: you pay only for actual execution time. This is radically different from keeping an agent server running 24/7.

The second cost vector is **tool calls**: each Lambda invocation, each DynamoDB query (for memory), each Web Search call has a cost. AgentCore Observability gives you the visibility to identify which tools are called most frequently and optimize — for example, putting a cache in front of repeated searches.

## Defense in Depth: Guardrails, Least Privilege, and Session Isolation

Security in an agent system has three layers that need to work together. The first is **session isolation**: each user runs in a completely separate context in the AgentCore Runtime. There is no memory leakage between sessions — what user A said does not appear in user B's context. This is analogous to process isolation in an OS: each process has its own address space.

The second layer is **Bedrock Guardrails**: configurable filters that inspect both user input and model output. You configure forbidden topics (e.g., 'do not discuss competitors'), PII filters (tax IDs, credit cards are automatically masked), and offensive content thresholds. Guardrails run **outside** the model — they do not consume tokens and cannot be bypassed by prompt injection in the message content.

The third layer is **least privilege on tools**: the IAM Role of the AgentCore Runtime should have access only to the tools that specific agent needs. If the customer support agent does not need to write to the database, the role has no write permission. AgentCore Identity manages credentials for external tools (third-party APIs, OAuth tokens) — the model never sees these credentials, they are injected by the runtime at call time.

A pattern I recommend: use **session tags** in IAM so that each tool call carries the user ID. This enables granular auditing in CloudTrail — you can reconstruct exactly which tools were called by which user in which session.

## AgentCore in Production — Well-Architected Lens

- **security**: Session isolation in Runtime, Guardrails for PII and forbidden topics, AgentCore Identity for tool credentials, IAM with least privilege and session tags for CloudTrail auditing.
- **reliability**: Serverless runtime with automatic retry; Memory with DynamoDB backend (multi-AZ by default); configurable model fallback in Bedrock (e.g., Sonnet → Haiku on throttling).
- **performance**: Model selection by task (do not use Claude 3.7 for simple classification); tool caching for repeated searches; response streaming to reduce user-perceived latency.

## Where to Start — Mental Checklist for the AWS Architect

- 1. Define the use case: is it a reactive agent (answers questions) or proactive (executes tasks)? This determines whether you need long-term memory and which tools are required.
- 2. Start with Bedrock Agents in the console to validate the system prompt and basic tools. Migrate to Strands + AgentCore when you need session isolation or custom tools.
- 3. Choose the model by problem, not reputation: test Haiku first, move up to Sonnet if quality is insufficient, use extended thinking only for genuinely complex reasoning.
- 4. Configure Guardrails from day 1: PII masking and forbidden topics are non-negotiable in production. Add content filters even if the use case seems 'safe'.
- 5. Instrument with AgentCore Observability from the start: distributed traces are the only way to debug reasoning loops in production. Add automated evals (groundedness) as a deploy gate.
- 6. Model the cost before going to production: estimate sessions/month × ReAct steps × tokens per step × model price. A well-sized agent costs less than $50/month for modest loads.

> **My Perspective — For Those Making This Transition:** After 16 years building financial and data systems on AWS, what impresses me most about AgentCore is not the technology itself — it is the fact that the problems it solves (isolation, credentials, observability, zero idle cost) are exactly the same problems we solved in microservices a decade ago. The difference is that now the 'service' is a non-deterministic reasoning loop, and that fundamentally changes how you test, monitor, and operate it.

The most common mistake I see architects making when entering this space is treating the agent like a deterministic microservice. It is not. You need to accept that the same input can produce different outputs, and design the system for that: automated evals instead of classic unit tests, reasoning traces instead of transaction logs, guardrails instead of schema validation.

The second mistake is choosing the most powerful model by default. The cost of using Claude 3.7 for everything is prohibitive at scale — and in most cases, Haiku or Sonnet solve the problem at a fraction of the cost. Treat model selection as an architecture decision, not a configuration.

If you come from data platforms or financial systems, you already have the most important skills: thinking about isolation, auditing, cost, and reliability. AgentCore is, at its core, a data platform for reasoning loops. The transition is more natural than it seems.

## Closing the Series: What the Three Parts Build Together

This series traveled the architecture elevator from top to bottom. **Part 1** established the business floor: what an AI agent is, why it is different from a chatbot, and when it makes sense to use one. **Part 2** descended to the design floor: orchestration patterns (ReAct, Chain-of-Thought, multi-agent), how reasoning loops work, and the most common design pitfalls. This **Part 3** reached the technical floor: how to put all of this into production on AWS with isolation, security, observability, and controlled cost.

What the three parts build together is a **complete mental model**: you know why agents exist (Part 1), how they think (Part 2), and how you operate them (Part 3). This mental model is what differentiates an architect who 'uses LLMs' from an architect who 'designs agent systems'.

The next studies in this architecture series go deeper on two themes we only touched here: **multi-agent orchestration** (how to coordinate multiple specialized agents in a coherent system) and **AgentCore in production** (a detailed case study with real cost and latency numbers). If you made it this far, you have the foundation to read those studies with depth.

## Verdict — The Real State of AgentCore in 2025

Amazon Bedrock AgentCore solves real production problems that any team encounters when leaving the notebook. The serverless Runtime with session isolation, the MCP Gateway, managed memory, and observability with traces are mature enough components for production systems in 2025. The pay-per-use cost model is genuinely advantageous for intermittent workloads — a well-sized agent costs less than $50/month for moderate use, with no idle cost.

The real limitations: the ecosystem is still young (many APIs in preview), documentation has gaps, and the learning curve for debugging non-deterministic reasoning loops is real. The choice between Strands and LangGraph depends on the complexity of your reasoning graph — there is no universal answer.

My practical recommendation: if you are building your first production agent on AWS, start with Bedrock Agents to validate the use case, migrate to Strands + AgentCore when you need fine control, and invest in observability from day 1. Model selection is the highest-impact decision on TCO — treat it as an explicit architecture decision, reviewed periodically as new models are released in Bedrock.

## References

- [Amazon Bedrock AgentCore — Product Page](https://aws.amazon.com/bedrock/agentcore/)
- [Amazon Bedrock Agents — Product Page](https://aws.amazon.com/bedrock/agents/)
- [Introducing Web Search on Amazon Bedrock AgentCore — AWS Blog](https://aws.amazon.com/blogs/machine-learning/introducing-web-search-on-amazon-bedrock-agentcore/)
- [Building Effective Agents — Anthropic Engineering](https://www.anthropic.com/engineering/building-effective-agents)
- [The Architecture Elevator — Gregor Hohpe](https://architectelevator.com/)

## Case sources

- [AWS — Amazon Bedrock AgentCore](https://aws.amazon.com/bedrock/agentcore/)
- [AWS — Amazon Bedrock Agents](https://aws.amazon.com/bedrock/agents/)
- [AWS — Web Search on Amazon Bedrock AgentCore](https://aws.amazon.com/blogs/machine-learning/introducing-web-search-on-amazon-bedrock-agentcore/)
- [Anthropic — Building effective agents](https://www.anthropic.com/engineering/building-effective-agents)
