Inside AI Agents (3/3): Agents in Production on AWS with Bedrock AgentCore
Listen to study
generated on playGenerated only on first play
Powered by Amazon Polly + OmniVoice
The third and final part of the series descends the elevator to the technical floor: how to run an AI agent in production on AWS using Amazon Bedrock AgentCore. We cover the full component map (Runtime, Gateway, Memory, Identity, Observability), model selection by cost/latency/reasoning, security with guardrails and session isolation, and FinOps to keep real costs under a few dollars per month.
In parts 1 and 2 of this series you learned what an AI agent is and how it reasons in loops. Now it is time to put that into production — with real code, real infrastructure, and a real AWS bill. Amazon Bedrock AgentCore is the set of building blocks AWS released in 2025 to solve exactly the problems that surface when you leave the notebook and go to production: isolation between user sessions, secure credentials for external tools, memory that persists across conversations, traceability of every agent decision, and controlled cost on a pay-per-use model. If you come from microservices or data platforms, you will recognize the patterns — just applied to an agent runtime. This document is the technical floor of the elevator: we leave 'why agents' and enter 'how to run this seriously'.
What You Will Learn
Quick Glossary — Terms That Appear in This Lesson
- AgentCore
- Set of AWS managed services (launched 2025) providing the infrastructure building blocks to run agents in production: runtime, memory, identity, observability, and tools.
- Runtime (AgentCore Runtime)
- Serverless execution environment that runs the agent loop. Each user session is isolated — think of an ephemeral container that is born, executes, and dies per session.
- MCP (Model Context Protocol)
- Open protocol (Anthropic, 2024) that standardizes how a language model calls external tools. It is the 'USB-C of agents': a universal connector between the LLM and any tool.
- Gateway (AgentCore Gateway)
- Component that exposes tools (APIs, Lambda functions, external services) to the agent via MCP. It is the 'API Gateway' of the agent world.
- Guardrails
- Configurable filters in Bedrock that block unwanted content (forbidden topics, PII, hallucinations) before the response reaches the user. It is the 'WAF of the LLM'.
- TCO (Total Cost of Ownership)
- Total cost of keeping a system running: not just the LLM token bill, but also compute, storage, tool calls, and operations.
- Strands Agents
- AWS open-source framework for building agents in Python, with native integration to Bedrock AgentCore. Think of it as the 'CDK of agents': a high-level abstraction over the infrastructure.
- LangGraph
- LangChain framework for orchestrating agents as state graphs. More flexible and more complex than Strands; recommended when the reasoning flow is highly customized.
The AgentCore Map: Six Pieces That Solve Six Real Problems
When you put an agent into production for the first time, six problems appear almost simultaneously. First: where does the loop run? You do not want to manage servers for something that may be idle 99% of the time. Second: how does the agent call external tools in a standardized way? Each API has its own schema; without a common protocol, you write glue code forever. Third: how does the agent remember what happened? LLMs are stateless by nature — each call starts from zero. Fourth: how does the agent obtain credentials for tools without you putting secrets in the prompt? Fifth: how do you know what the agent decided and why? Without traceability, debugging an agent is like debugging a program without logs. Sixth: how does the agent access information that was not in its training data?
AgentCore solves each of these problems with a dedicated component: Runtime (where the loop runs, serverless, isolated per session), Gateway (MCP protocol for tools), Memory (managed short and long term), Identity (tool credentials without secrets in code), Observability (traces, metrics, and evaluation), and the managed tools Browser and Code Interpreter (for browsing the web and executing code safely). Each piece can be used independently — you do not need to adopt everything at once.
Diagram 1 — Amazon Bedrock AgentCore Capability Map
Static view of AgentCore components and how they relate. Read left (user) to right (external tools), passing through the execution core.
- User / App · client
- Strands / LangGraph · or custom SDK
- AgentCore Runtime · serverless, isolated session
- AgentCore Memory · short-term + long-term
- AgentCore Identity · tool credentials / OAuth
- Bedrock Model · Claude / Nova / Llama
- Bedrock Guardrails · content + PII filter
- AgentCore Gateway · MCP server / tool registry
- Managed Browser · web navigation tool
- Code Interpreter · sandboxed execution
- Web Search · grounding / freshness
- AgentCore Observability · traces + evals
- CloudWatch · metrics / logs
- External APIs · CRM, ERP, DBs
- Lambda Functions · custom tools
Bedrock Agents vs AgentCore vs Frameworks: When to Use Each
This is the question that most confuses people arriving at the AWS agent ecosystem. There are three layers and they compose, they do not exclude each other.
Bedrock Agents is the high-level, fully managed service with a console UI. You define a system prompt, associate a model, connect knowledge bases (RAG) and action groups (tools). It is the fastest path to a working agent — think of it as the 'Amplify of agents': productive for common cases, less flexible for advanced ones.
AgentCore is the infrastructure layer. It does not orchestrate the agent — it provides the blocks any orchestrator needs: isolated runtime, managed memory, tool gateway, identity, and observability. You use AgentCore when you need fine control over the reasoning loop or when you are building a multi-agent system.
Strands Agents is an AWS open-source Python framework that uses AgentCore as its backend. It is the code abstraction over the infrastructure — you write @tool in Python and Strands handles the ReAct loop, the Bedrock call, and the AgentCore integration. LangGraph is a more flexible and more complex alternative, ideal when the agent state graph is non-linear or when you need explicit control over each state transition.
The practical rule: start with Bedrock Agents for proofs of concept. Migrate to Strands + AgentCore when you need session isolation, persistent memory, or custom tools. Use LangGraph when the reasoning flow is a complex graph that Strands cannot express.
Bedrock Agents × AgentCore + Strands × LangGraph — When to Use Each
| Criterion | Bedrock Agents | AgentCore + Strands | AgentCore + LangGraph | |
|---|---|---|---|---|
| Learning curve | Low — console UI | Medium — Python SDK | High — state graphs | — |
| Loop control | Managed by AWS | Controlled via decorators | Full control via graph | — |
| Session isolation | Basic | Native (AgentCore Runtime) | Native (AgentCore Runtime) | — |
| Multi-agent | Limited | Supported | Native (agent graph) | — |
| Tools via MCP | Partial (action groups) | Native (AgentCore Gateway) | Native (AgentCore Gateway) | — |
| Best for | PoC, simple RAG, chatbots | Production, custom tools | Complex flows, advanced orchestration | — |
Web Search and MCP: How the Agent Knows What Happened Today
Every language model has a training cutoff date — a point in time after which it does not know what happened. It is like hiring a brilliant consultant who spent the last 18 months on an internet-free retreat: they know everything they learned before, but not what came out yesterday. For a production agent, this is a serious problem: prices, regulations, news, API documentation — everything changes.
The solution is search-based grounding: instead of relying solely on the model's knowledge, the agent uses a search tool to retrieve current information before responding. AgentCore Web Search exposes this capability as an MCP tool — the agent decides when to search, formulates the query, receives the results, and incorporates them into the context before generating the final response. It is exactly the same mechanism a human uses when opening Google in the middle of a conversation.
This site — the architecture portfolio you are reading right now — uses this exact mechanism to stay current. When an agent generates or revises an architecture document, it uses Web Search via MCP to check for new services, price changes, or documentation updates that would invalidate the content. It is a real meta-example: the mechanism we are describing is the mechanism that keeps this document relevant.
From a security perspective, Web Search via AgentCore has an advantage over calling search APIs directly: the search API credentials live in AgentCore Identity, not in the agent's code. The runtime injects the token at call time — the model never sees the key.
Diagram 2 — One Agent Session Flow: Request → Runtime → Tools → Response
Temporal flow of a single session. Read top to bottom. Dashed arrows are observability spans captured in parallel.
- User Request · 'What is the current · Selic rate?'
- Session Init · load short-term memory
- ReAct Step 1 · Thought: need current data
- Tool Decision · Action: web_search(query)
- ReAct Step 2 · Observation: search result
- Final Answer · write to memory + respond
- Claude 3.5 Sonnet · reasoning + tool_use
- Guardrails · PII + topic filter
- Web Search Tool · AgentCore Web Search
- Memory Store · short + long term
- Trace Span · session_id + step + latency
- Eval Hook · groundedness score
Model Selection, FinOps, and the Real Cost of an Agent in Production
Model choice is the highest-impact decision on an agent's TCO. In Bedrock, you have a spectrum ranging from light, cheap models (Amazon Nova Micro, ~$0.035/1M input tokens) to heavy reasoning models (Claude 3.7 Sonnet with extended thinking, ~$3/1M input tokens). The cost difference is two orders of magnitude — and in an agent with multiple ReAct steps, you pay for each model call.
The rule I use in practice: use the cheapest model that solves the problem. For classification, structured data extraction, and simple responses, Nova Micro or Haiku are sufficient. For multi-step reasoning with tools, Claude 3.5 Sonnet is the optimal cost/quality point. For problems requiring deep reasoning (complex financial planning, legal analysis), Claude 3.7 with extended thinking is worth the extra cost.
In practice, a customer support agent with 1,000 sessions/month, an average of 5 ReAct steps per session, and 2,000 tokens of context per step stays below $5/month using Haiku — and below $50/month using Sonnet. Idle cost is zero because AgentCore Runtime is serverless: you pay only for actual execution time. This is radically different from keeping an agent server running 24/7.
The second cost vector is tool calls: each Lambda invocation, each DynamoDB query (for memory), each Web Search call has a cost. AgentCore Observability gives you the visibility to identify which tools are called most frequently and optimize — for example, putting a cache in front of repeated searches.
Defense in Depth: Guardrails, Least Privilege, and Session Isolation
Security in an agent system has three layers that need to work together. The first is session isolation: each user runs in a completely separate context in the AgentCore Runtime. There is no memory leakage between sessions — what user A said does not appear in user B's context. This is analogous to process isolation in an OS: each process has its own address space.
The second layer is Bedrock Guardrails: configurable filters that inspect both user input and model output. You configure forbidden topics (e.g., 'do not discuss competitors'), PII filters (tax IDs, credit cards are automatically masked), and offensive content thresholds. Guardrails run outside the model — they do not consume tokens and cannot be bypassed by prompt injection in the message content.
The third layer is least privilege on tools: the IAM Role of the AgentCore Runtime should have access only to the tools that specific agent needs. If the customer support agent does not need to write to the database, the role has no write permission. AgentCore Identity manages credentials for external tools (third-party APIs, OAuth tokens) — the model never sees these credentials, they are injected by the runtime at call time.
A pattern I recommend: use session tags in IAM so that each tool call carries the user ID. This enables granular auditing in CloudTrail — you can reconstruct exactly which tools were called by which user in which session.
AgentCore in Production — Well-Architected Lens
Security
Session isolation in Runtime, Guardrails for PII and forbidden topics, AgentCore Identity for tool credentials, IAM with least privilege and session tags for CloudTrail auditing.
Reliability
Serverless runtime with automatic retry; Memory with DynamoDB backend (multi-AZ by default); configurable model fallback in Bedrock (e.g., Sonnet → Haiku on throttling).
Performance efficiency
Model selection by task (do not use Claude 3.7 for simple classification); tool caching for repeated searches; response streaming to reduce user-perceived latency.
Where to Start — Mental Checklist for the AWS Architect
After 16 years building financial and data systems on AWS, what impresses me most about AgentCore is not the technology itself — it is the fact that the problems it solves (isolation, credentials, observability, zero idle cost) are exactly the same problems we solved in microservices a decade ago. The difference is that now the 'service' is a non-deterministic reasoning loop, and that fundamentally changes how you test, monitor, and operate it. The most common mistake I see architects making when entering this space is treating the agent like a deterministic microservice. It is not. You need to accept that the same input can produce different outputs, and design the system for that: automated evals instead of classic unit tests, reasoning traces instead of transaction logs, guardrails instead of schema validation. The second mistake is choosing the most powerful model by default. The cost of using Claude 3.7 for everything is prohibitive at scale — and in most cases, Haiku or Sonnet solve the problem at a fraction of the cost. Treat model selection as an architecture decision, not a configuration. If you come from data platforms or financial systems, you already have the most important skills: thinking about isolation, auditing, cost, and reliability. AgentCore is, at its core, a data platform for reasoning loops. The transition is more natural than it seems.
Closing the Series: What the Three Parts Build Together
This series traveled the architecture elevator from top to bottom. Part 1 established the business floor: what an AI agent is, why it is different from a chatbot, and when it makes sense to use one. Part 2 descended to the design floor: orchestration patterns (ReAct, Chain-of-Thought, multi-agent), how reasoning loops work, and the most common design pitfalls. This Part 3 reached the technical floor: how to put all of this into production on AWS with isolation, security, observability, and controlled cost.
What the three parts build together is a complete mental model: you know why agents exist (Part 1), how they think (Part 2), and how you operate them (Part 3). This mental model is what differentiates an architect who 'uses LLMs' from an architect who 'designs agent systems'.
The next studies in this architecture series go deeper on two themes we only touched here: multi-agent orchestration (how to coordinate multiple specialized agents in a coherent system) and AgentCore in production (a detailed case study with real cost and latency numbers). If you made it this far, you have the foundation to read those studies with depth.
Verdict — The Real State of AgentCore in 2025
Amazon Bedrock AgentCore solves real production problems that any team encounters when leaving the notebook. The serverless Runtime with session isolation, the MCP Gateway, managed memory, and observability with traces are mature enough components for production systems in 2025. The pay-per-use cost model is genuinely advantageous for intermittent workloads — a well-sized agent costs less than $50/month for moderate use, with no idle cost. The real limitations: the ecosystem is still young (many APIs in preview), documentation has gaps, and the learning curve for debugging non-deterministic reasoning loops is real. The choice between Strands and LangGraph depends on the complexity of your reasoning graph — there is no universal answer. My practical recommendation: if you are building your first production agent on AWS, start with Bedrock Agents to validate the use case, migrate to Strands + AgentCore when you need fine control, and invest in observability from day 1. Model selection is the highest-impact decision on TCO — treat it as an explicit architecture decision, reviewed periodically as new models are released in Bedrock.
Post-mortems, ADRs and architecture deep dives in your inbox — the way an architect reads them.
No spam · unsubscribe anytime
Ask Fernando about this
Get a focused answer about this study from my AI assistant, grounded in my work.
Join the conversation
Sign in to comment
Verify your email to join in — you'll also get the newsletter. No password.