ADR: Adopting Amazon Bedrock AgentCore in Production
Listen to article
generated on playGenerated only on first play
Bedrock AgentCore promises to reduce the operational friction of running AI agents in production, but adopting any managed agent orchestration platform demands an explicit architectural decision. In this ADR, I document the forces that drove me to evaluate AgentCore, the alternatives considered, and the real consequences of each path.
After 16 years building financial platforms on AWS, I've learned that the most dangerous question in architecture isn't 'does this work?' — it's 'who operates this at 2 AM when it breaks?' Bedrock AgentCore is AWS's answer to the problem of operationalizing AI agents beyond the notebook: managed runtime, memory, tool-use, guardrails, and traceability in a single control plane. This ADR documents how I arrived at the decision to adopt it — or not — in a regulated financial environment, and the consequences you need to internalize before doing the same.
Context and Forces
Context and Forces
The scenario that motivated this decision is recurring in financial institutions: a product team wants to expose an AI agent to internal analysts — capable of querying market data via API, running risk calculations in Lambda, retrieving context from regulatory documents via RAG, and recording every action in an immutable audit trail. The MVP worked in two sprints with LangChain + Claude via Bedrock. The problem surfaced the following week.
Five forces made the decision urgent: (1) Cross-turn state management — financial agent sessions last minutes, not seconds; reliably maintaining context in stateless Lambda is brittle. (2) Regulatory traceability — every tool call, every model decision, every response must be auditable with timestamp, identity, and full payload, without relying on ad-hoc logging. (3) Guardrails as contract — in finance, the agent cannot leak PII, cannot recommend products without disclaimers, cannot execute irreversible actions without human confirmation. Implementing this manually in every agent is guaranteed technical debt. (4) Unpredictable token cost — without per-session budget control, a faulty agent loop can consume tens of dollars in minutes. (5) Runtime portability — the platform team doesn't want to maintain a custom agent scheduler; they want an SLA contract with AWS.
Options Considered
Option A: Self-hosted LangChain/LangGraph on EKS
- Full control over execution graph and retry logic
- Model portability — swap LLM without platform change
- Mature ecosystem of community integrations and tools
- Full operational responsibility: scaling, HA, patching, observability
- Guardrails and audit trail must be built and maintained by the team
- Session memory management requires custom DynamoDB or Redis
- High engineering cost to reach parity with managed features
Suitable for teams with mature AI platform; high operational risk for smaller teams
Option B: Bedrock Agents (prior generation, without AgentCore)
- AWS-managed, no runtime infrastructure to operate
- Native integration with Knowledge Bases and Action Groups
- Limited observability: partial traces, no native span-level detail
- No native per-session budget control
- Agent loop customization restricted to what AWS exposes
Good for simple cases; observability limitations are blockers in finance
Option C: Amazon Bedrock AgentCore
- Managed runtime with native persistent session memory (AgentCore Memory)
- Configurable guardrails as declarative policy, not inline code
- Native traceability via CloudTrail + X-Ray with tool-call spans
- AgentCore Gateway for tool-use with OAuth2/OIDC and per-tool throttling
- Configurable per-session token budget control
- Platform lock-in to AWS for the agent runtime
- Execution graph customization more restricted than LangGraph
- New service: API surface still evolving, conservative quotas
- AgentCore Memory and Gateway costs added on top of inference cost
Recommended decision for regulated financial environments with a lean platform team
Option D: Step Functions + Lambda as agent orchestrator
- Native audit via Step Functions execution history
- Declarative and testable retry, timeout, and error handling
- No new service to learn — team already knows the pattern
- Not an agent runtime: each 'turn' requires a new execution or .waitForTaskToken
- Session memory and model context must be managed externally
- Cold-start and state transition latency can be noticeable in dialogues
Excellent for deterministic workflows; inadequate as a conversational agent runtime
The Decision and the Reasoning Behind It
The Decision and the Reasoning Behind It
The decision was to adopt Bedrock AgentCore as the primary agent runtime, with Step Functions as the orchestrator for adjacent deterministic workflows (approvals, reconciliations, notifications). This is not an all-or-nothing decision: AgentCore solves the non-deterministic agent loop problem, while Step Functions remains the right choice for the deterministic business process that wraps the agent.
The decisive argument was the AgentCore Gateway with per-tool OAuth2/OIDC support. In a financial environment, every tool-call is an action with identity: who authorized it, what scope, with which token. Implementing this manually in LangChain would mean building and maintaining an authorization proxy — exactly the kind of infrastructure that generates no business value but generates security incidents when neglected. The Gateway delivers this as declarative configuration, with per-tool throttling (e.g., maximum 10 calls/session for the order execution API) and a native circuit breaker.
The second argument was session memory with configurable TTL. AgentCore Memory persists conversation context in a managed store, with per-session configurable TTL and KMS customer-managed key (CMK) encryption. For LGPD/GDPR compliance, this means I can configure a 24h TTL for analyst sessions and guarantee that no session data persists beyond what's necessary — without building a custom expiration pipeline.
The lock-in trade-off was consciously accepted: the tool-use layer (the Lambda functions that execute the actual actions) remains completely portable. If we need to migrate the runtime in the future, the tools keep working.
Financial Agent Architecture with Bedrock AgentCore
Execution flow of a financial analysis agent: from analyst to AgentCore runtime, through guardrails, tool-use via Gateway, session memory, and observability
- API Gateway · REST + Cognito JWT
- Bedrock Guardrails · PII filter + topic deny
- AgentCore Runtime · Claude 3.5 Sonnet
- AgentCore Memory · TTL=24h, KMS CMK
- AgentCore Gateway · OAuth2/OIDC, throttle
- Lambda: Market Data · Bloomberg API proxy
- Lambda: Risk Calc · VaR engine
- Knowledge Base · OpenSearch + S3
- X-Ray · span por tool-call
- CloudTrail · API audit log
- CloudWatch · SLO dashboards
Concrete Configuration: What Actually Matters
Concrete Configuration: What Actually Matters
Adopting AgentCore without properly configuring operational controls is worse than not adopting it — you gain a false sense of security without active guardrails. Here are the configurations that make a real difference:
Guardrails as first line: Configure contentPolicyConfig with HATE, INSULTS, SEXUAL, VIOLENCE all set to BLOCK, and sensitiveInformationPolicyConfig with PII filters for CREDIT_DEBIT_CARD_NUMBER, AWS_ACCESS_KEY, NAME, and EMAIL in ANONYMIZE mode. In a financial environment, add topicPolicyConfig with explicitly denied topics: "investment advice without disclaimer", "guaranteed returns". This isn't paranoia — it's the minimum to pass a compliance review.
AgentCore Memory with correct partitioning: The memory partition key must be userId + sessionId, never just sessionId. In multi-tenant environments, sessions from different users with the same sessionId collided in testing — a silent bug that leaks context between users. Configure memoryConfiguration.enabledMemoryTypes with SESSION_SUMMARY for long sessions, reducing context token consumption by up to 40% in sessions exceeding 20 turns.
Gateway with per-tool throttling: Define separate rateLimit for each Action Group. The order execution API should have maxRequestsPerSession: 5 and requireConfirmation: ENABLED. The market data query API can have maxRequestsPerSession: 50. Without this granularity, a faulty agent loop can execute dozens of orders before being detected — a scenario I've seen happen in production with frameworks lacking tool-use controls.
Per-session token budget: Configure sessionConfiguration.maxTokens with a conservative initial value — I recommend 50,000 tokens for typical analysis sessions. Monitor the p95 token consumption per session in CloudWatch and adjust. An agent entering a reasoning loop can consume 200k+ tokens in a single session without this control.
Observability: What to Measure and How
Observability: What to Measure and How
AI agents have a different observability profile from traditional APIs. p99 latency is less useful than turns-per-session distribution and tool-call failure rate per tool. Here is the observability model I implemented:
Agent business metrics (via CloudWatch custom metrics with namespace FinancialAgent):
TurnsPerSession— histogram; alert if p95 > 15 turns (indicates loop or poorly calibrated prompt)TokensPerSession— histogram; alert if p95 > 40k tokensToolCallFailureRateperToolName— counter; SLO of < 1% failure for critical toolsGuardrailInterventionRate— counter; spike indicates jailbreak attempt or prompt injection
Traces with X-Ray: AgentCore emits spans for each tool invocation with attributes bedrock.agent.toolName, bedrock.agent.sessionId, and bedrock.agent.turnCount. Configure a trace group with filter annotation.bedrock.agent.toolName = "ExecuteOrder" and alert on latency > 2s — order execution above that indicates a downstream API issue.
CloudTrail for regulatory audit: Each InvokeAgent API call is recorded with the caller ARN, sessionId, and inputText (truncated). For compliance, configure an S3 bucket with Object Lock in COMPLIANCE mode and 7-year retention for AgentCore CloudTrail logs. This is the minimum to meet Banco Central do Brasil and SEC audit requirements.
Cost anomaly alarm: Configure an AWS Budget with an alert at 80% of the monthly Bedrock budget, with an SNS action. Add a second CloudWatch alarm on bedrock:InvokeModel with model-id=anthropic.claude-3-5-sonnet and a threshold of 1,000 invocations/hour — above that, something is wrong.
Consequences and Risks You Need to Accept
Runtime lock-in is real: If AWS deprecates or significantly changes the AgentCore API, migration requires rewriting the orchestration logic — not just the tools. Mitigate by keeping tools (Lambda) completely runtime-agnostic and documenting the interface contract in a separate ADR.
Conservative quotas on a new service: AgentCore has concurrent agent sessions quotas that, at launch, were significantly lower than traditional Bedrock Agents quotas. Request quota increases before go-live, not after. A peak event without adequate quota results in ThrottlingException that the end client sees as a timeout.
Guardrails have latency: Each pass through Guardrails adds 100-300ms of latency. In an agent with 10 turns, that's up to 3 additional seconds of accumulated latency. For use cases where latency is critical, consider disabling output guardrails on internal tools (not exposed to the end user) and applying them only on the final output.
Memory is not free: AgentCore Memory charges for storage and per read/write operation. In long sessions with SESSION_SUMMARY active, memory cost can exceed inference cost for short sessions. Monitor MemoryReadLatency and MemoryWriteLatency — above 200ms indicates pressure on the managed store.
Human-in-the-loop is not automatic: requireConfirmation: ENABLED on the Gateway pauses execution and waits for confirmation via callback. If the client doesn't respond within confirmationTimeout (default: 300s), the session expires. Design the UX to make this clear to the user — timeout-expired sessions are the leading cause of complaints in financial agents.
Real Reference Numbers
Well-Architected Assessment
Security
Declarative guardrails with PII filter and denied topics; AgentCore Gateway with per-tool OAuth2/OIDC; KMS CMK for session memory; CloudTrail with S3 Object Lock for immutable audit. IAM with bedrock:AgentArnLike condition to restrict which agents can invoke which tools.
Reliability
Automatic retry with jitter in the Bedrock SDK (max_attempts=3, mode=adaptive); native circuit breaker in AgentCore Gateway per tool; configurable session timeout prevents zombie sessions; concurrent session quotas must be requested before go-live.
Performance efficiency
SESSION_SUMMARY reduces context tokens by ~40% for long sessions; disabling output guardrails on internal tools reduces accumulated latency; Knowledge Base with OpenSearch k-NN with HNSW and ef_search=512 for low-latency RAG.
Cost optimization
AWS Budget with alert at 80% of monthly limit; CloudWatch alarm on invocations/hour per model-id; SESSION_SUMMARY reduces inference cost in long sessions; monitor AgentCore Memory cost separately from inference cost.
What the AWS Blog Doesn't Tell You
What the AWS Blog Doesn't Tell You
AWS service launch blogs are excellent at showing the happy path. What they rarely cover are the edge cases you only discover in production. Here are the three that cost me the most time:
Tool-call idempotency is not guaranteed by the runtime. If AgentCore attempts to invoke a tool and receives a timeout, it may retry — and your Lambda may be invoked twice for the same action. For idempotent tools (queries), this is harmless. For tools with side effects (order execution, email sending), you need to implement idempotency in the Lambda using an idempotencyToken derived from sessionId + turnId + toolName. Without this, order duplication is a matter of when, not if.
The model can ignore requireConfirmation in certain prompt formulations. I tested this: if the system prompt instructs the agent to "be proactive and execute actions without asking for unnecessary confirmation," the model may rationalize that a specific action doesn't need confirmation even with the flag active. The correct defense is dual: the flag on the Gateway and an explicit instruction in the system prompt about when confirmation is mandatory. Never rely on a single layer.
AgentCore doesn't have native multi-agent support yet. If your architecture requires a supervisor agent delegating to specialized agents (multi-agent orchestration pattern), you'll need to implement the delegation logic manually — typically with an agent that calls other agents via tool-use, where each "tool" is actually an invocation of another AgentCore. It works, but cross-session traceability requires manual sessionId correlation via X-Ray.
Anti-Patterns I've Seen in Architecture Reviews
- Using AgentCore without configuring Guardrails because "it's an internal environment" — insiders are the primary source of compliance incidents in finance
- Storing full session history in memory without SESSION_SUMMARY — token cost grows linearly with number of turns
- Implementing critical business logic inside the agent system prompt instead of in testable tools — prompts don't have unit tests
- Not requesting concurrent session quota increase before go-live — ThrottlingException during peak usage is predictable and preventable
- Assuming AgentCore Gateway replaces a business authorization layer — the Gateway controls access to the tool, not the authorization logic inside the tool
- Not implementing idempotency in tool Lambdas with side effects — runtime retries can duplicate irreversible actions
In practice, what convinced me to adopt AgentCore was not any individual feature — it was the fact that the AgentCore Gateway with per-tool OAuth2/OIDC solves the tool-call identity problem I was about to build manually, which would have taken two sprints and generated permanent technical debt. The hard-won lesson behind this: in financial environments, the cost of building custom security controls is not the initial development cost — it's the cost of maintaining, auditing, and fixing those controls over years. When a managed service delivers the control as declarative configuration, the decision to adopt it is rarely about feature parity; it's about where you want to allocate your team's engineering attention. My recommendation: adopt AgentCore for new production agents, keep tools portable, and invest the saved time in observability and adversarial prompting tests.
Verdict: Adopt with Explicit Controls
Bedrock AgentCore is the right choice for financial teams that need to put AI agents into production without building and maintaining a custom orchestration runtime. The decision is not binary — it's about recognizing that AgentCore's value lies in the operational controls (Gateway, Guardrails, Memory with CMK), not just the execution runtime. The condition for adoption is clear: configure Guardrails before any testing with real data, implement idempotency in all tools with side effects, request concurrent session quota increases before go-live, and monitor TurnsPerSession and TokensPerSession as first-class SLO metrics. Lock-in is real but manageable if tools are kept portable. For teams that lack the capacity to build and operate a custom agent runtime — which is most teams — AgentCore is the correct architectural decision in 2025.
References
Architecture intelligence, in your inbox
Curated signals and original analysis on AWS, AI, distributed systems and the market — the way a solutions architect reads them.
- Curated AWS · AI · architecture · market signals
- New architecture studies & deep-dives when they ship
- Sharp summaries — depth without the noise
- No spam · double opt-in · unsubscribe anytime