# Amazon Bedrock AgentCore Harness: From Idea to Production-Grade Agent

AgentCore Harness reached GA in June 2026 as a managed abstraction that collapses the LLM agent control plane into two API calls. In this article, I analyze how the harness works internally, where it fails, and what architects of financial-grade systems need to understand before putting it into production.

- URL: https://fernando.moretes.com/blog/amazon-bedrock-agentcore-harness-da-ideia-ao-agente-de-producao-amazon-bedro

- Markdown: https://fernando.moretes.com/blog/amazon-bedrock-agentcore-harness-da-ideia-ao-agente-de-producao-amazon-bedro/article.md?lang=en

- Published: 2026-06-24T12:00:00.000Z

- Category: AI & Agents

- Tags: bedrock, agentcore, agentic-ai, aws, financial-grade, observability, security, llm-orchestration

- Reading time: 9 min

- Source: [Amazon Bedrock AgentCore harness is now generally available: Go from idea to production-grade agent in minutes](https://aws.amazon.com/blogs/machine-learning/amazon-bedrock-agentcore-harness-is-now-generally-available-go-from-idea-to-production-grade-agent-in-minutes/)

---

The problem AgentCore Harness solves is not intelligence — it is plumbing. Any competent team can get an LLM agent running on a laptop in an afternoon. What blows up the timeline is the production layer: execution isolation, session management, tool routing, credential storage, distributed tracing, and the multiplication of all of that across every new use case. With AgentCore Harness reaching GA, AWS has bet that this orchestration layer can be managed — and that bet carries serious architectural consequences for anyone designing AI systems in regulated environments.

## The Agent Loop as the Common Denominator

Simon Willison defined an LLM agent surgically: *an LLM that runs tools in a loop to achieve a goal*. That definition survived because it captures the shape every real production agent takes — Kiro, Amazon Q Developer, Claude Code, Codex. The loop is the invariant. What varies is everything around it.

When I analyze agent systems in financial environments — reconciliation automation, fraud alert triage, regulatory report generation — the loop itself is rarely the bottleneck. The bottleneck is the infrastructure that sustains the loop: where session state lives between invocations, how tool credentials are rotated without interrupting active conversations, how tenant isolation is enforced when multiple users share the same agent instance, and how each step of the model's reasoning is traced for audit purposes.

AgentCore Harness addresses exactly that layer. It does not replace the model, does not rewrite the prompt, and does not make business decisions. It manages the agent's *control plane*: the sandboxed environment, persistent memory, tool routing, identity storage, and observability. The distinction matters because it defines the boundaries of what you can and cannot delegate to it — and understanding those boundaries is what separates a successful adoption from a production incident.

## AgentCore Harness: Agent Invocation Lifecycle

Full lifecycle of a harness invocation — from API call through the tool loop, memory, identity, and observability. Edges show sequence and interaction type.

### 🟧 AWS — AgentCore Control Plane

- Harness API CreateHarness / InvokeHarness (edge)
- Session Manager actorId + sessionId isolation (compute)
- MicroVM Sandbox filesystem + shell (compute)

### 🤖 AI — Model Routing

- Model Router bedrock / openAI / gemini / liteLLM (ai)
- Agent Loop tool-use → observe → act (ai)

### 🔧 Tools — Execution Layer

- AgentCore Gateway OpenAPI / Lambda / MCP (IAM+JWT) (compute)
- Browser Sandbox click / navigate / screenshot (external)
- Code Interpreter Python + Node sandboxed (compute)
- Inline Function human-in-the-loop / client-side tool (external)

### 🧠 State — Memory & Identity

- Managed Memory SEMANTIC+SUMMARIZATION 30-day TTL, KMS (storage)
- Token Vault AgentCore Identity no raw creds to model (security)

### 📊 Observability — CloudWatch

- CloudWatch Auto-traces every step (data)

### Flows

- caller -> harness_api: 1. InvokeHarness (HTTP)
- harness_api -> session_mgr: 2. resolve actorId/sessionId
- session_mgr -> microvm: 3. provision isolated sandbox
- microvm -> model_router: 4. select provider/model
- model_router -> agent_loop: 5. start loop
- agent_loop -> gateway: tool-use → gateway
- agent_loop -> browser: tool-use → browser
- agent_loop -> code_interp: tool-use → code
- agent_loop -> inline_fn: tool-use → client (HITL)
- agent_loop -> memory: read/write context
- gateway -> token_vault: fetch outbound credential
- agent_loop -> cw_traces: auto-traced event stream
- harness_api -> caller: 6. real-time response stream

## How the Harness Really Works: Two API Calls and What They Hide

The public interface is deliberately simple: `CreateHarness` defines the agent (default model, tools, instructions, memory configuration) and `InvokeHarness` runs it. But what happens between those two calls is substantial.

When `InvokeHarness` arrives, the harness resolves the `actorId` and `sessionId` to determine the correct memory namespace — enforcing multi-tenant isolation by design, not by convention. It then provisions an isolated microVM with its own filesystem and shell. That environment is not shared across concurrent sessions of the same user, let alone across different users. This has direct implications for financial workloads where state contamination between sessions would be a compliance violation.

Model routing is resolved at invocation time. The `model` field on `CreateHarness` sets the default, but any `InvokeHarness` call can override the provider — including a mid-session switch from Claude Opus to GPT-5.5 without context loss. The harness serializes the conversation state and rehydrates it on the new provider. This is non-trivial: different providers have distinct message formats, different context limits, and tool-use behaviors that vary. The harness absorbs that translation complexity.

API credentials for external providers (OpenAI, Gemini, LiteLLM providers) are stored in the AgentCore Identity Token Vault. The model never sees raw credentials — the harness injects authentication headers into the outbound call. In a financial environment, this is the functional equivalent of a secrets manager integrated into the agent's execution plane, without requiring the developer to build that integration manually.

> **Isolation by Design vs. Isolation by Convention:** The memory namespace keyed on `actorId` is not just a convenience feature — it is a compliance control. In financial systems, customer data separation is a regulatory requirement (PCI-DSS, SOC 2, LGPD). The fact that the harness enforces this by default, without requiring the developer to build partitioning logic, removes an entire class of implementation errors that I have seen cause production incidents. The question you should be asking your security team is not 'does the harness isolate data?' — it is 'how do we audit that isolation is working?'. The answer lives in the CloudWatch traces.

## Managed Memory: What You Gain and What You Relinquish

The default memory behavior at GA is revealing of design priorities: if you omit the memory configuration on `CreateHarness`, the harness automatically provisions a memory resource with `SEMANTIC + SUMMARIZATION` strategies, 30-day event expiry, AWS-owned KMS encryption, and multi-tenant namespace isolation. That is a reasonable set of defaults for most use cases.

But in financial environments, those defaults raise specific questions. First, AWS-owned KMS encryption may not satisfy BYOK (Bring Your Own Key) requirements mandated by corporate security policies or sector regulations. If your organization requires control over key material, you need to provision the memory resource explicitly with a CMK (Customer Managed Key) and pass the ARN on `CreateHarness`. The harness supports this, but the default does not.

Second, the 30-day expiry is an operational TTL, not a data retention control. In systems subject to GDPR or LGPD, the right to erasure requires you to be able to delete a specific user's data on demand — not just wait for the TTL to expire. You need to understand the underlying memory resource management API to implement that control.

Third, the `SEMANTIC + SUMMARIZATION` strategies imply the harness is making inferences about which parts of the conversation are relevant to retain. This is powerful for UX, but it means the exact content of messages may not be preserved verbatim. For audit use cases where conversation history fidelity is a requirement, you need to complement harness memory with a separate audit log — which, conveniently, the automatic CloudWatch traces can provide.

## Tool Routing and the Gateway Authorization Model

The harness tool typology is one of the most interesting design decisions in the product. You have four paths: `agentcore_gateway` (governed tools via ARN, with IAM/JWT and per-tool authorization), `remote_mcp` (direct connection to an MCP server by URL), `agentcore_browser` (managed browser sandbox), and `inline_function` (a tool-use event emitted in the stream, awaiting client response — the human-in-the-loop pattern).

For financial systems, the `agentcore_gateway` path is the most relevant because it interposes a governance layer between the agent and downstream APIs. The Gateway supports OpenAPI, Smithy, Lambda, and MCP targets, with IAM or JWT authentication, and per-tool authorization. This means you can give the agent access to a gateway exposing 50 tools, but restrict via policy which tools a specific invocation can use — using the `allowed_tools` parameter on `InvokeHarness`.

The `remote_mcp` path is simpler but less governed: you point to an MCP server URL and the harness connects directly. This is appropriate when the MCP server already has its own authentication layer and you do not need the Gateway's centralized audit. In regulated environments, I prefer the Gateway precisely because it creates a single control point where you can log all tool calls, apply rate limiting policies, and revoke access without modifying the agent.

The `inline_function` deserves special attention: it inverts the flow. Instead of the harness executing the tool, it emits an event in the stream and waits for the client to respond. This enables human approvals in the loop — a common requirement in financial operations workflows where certain actions (transfers above a threshold, configuration modifications) require explicit approval before execution.

## Automatic Observability: What You Get and What You Still Need to Build

The harness automatically traces every step of the agent loop to CloudWatch — no manual instrumentation. Every invocation, every tool call, every model transition, every memory event appears as a traceable trace. For teams that were building this instrumentation manually with OpenTelemetry and Datadog, this is a significant reduction in work.

But 'automatically traced' is not the same as 'operationally observable'. What the harness provides is execution telemetry — what happened, in what order, with what latency. What you still need to build is the interpretation layer: SLOs on agent invocation latency, alerts on tool failure rate, dashboards that correlate memory usage with response quality, and — critical for financial environments — audit trails that record what decisions the agent made and based on what context.

A pattern I use in financial systems is to complement the harness's automatic traces with a structured audit log emitted via `inline_function`. Every time the agent is about to execute a high-impact action, it emits an `inline_function` event that the client captures, logs to an immutable audit system (S3 with Object Lock, for example), and then responds to the harness with approval or rejection. This creates an auditable chain of custody that CloudWatch traces alone do not provide — because traces capture what happened, but the audit log captures why it was approved.

The integration with the new Bedrock Guardrails API for agentic workflows (announced the same week as the harness GA) is the natural complement here: Guardrails can inspect model outputs before they become tool calls, adding a content control layer that operates independently of the Gateway's authorization logic.

## Critical Anti-Patterns in AgentCore Harness Usage

- **Treating the 30-day TTL as a data retention control**: The TTL is operational, not regulatory. In systems subject to LGPD/GDPR, implement explicit deletion via the memory resource management API — do not wait for the TTL to expire to satisfy a right-to-erasure request.
- **Using `remote_mcp` for high-impact tools without a governance layer**: Connecting directly to an MCP server bypasses the Gateway's per-tool authorization model. For financial actions (transfers, configuration modifications), always interpose `agentcore_gateway` for centralized audit and revocation control.
- **Assuming the default encryption satisfies BYOK requirements**: Default encryption uses AWS-owned keys. If your security policy requires CMKs (Customer Managed Keys) — common in financial institutions — you must explicitly provision the memory resource with your own CMK before creating the harness.
- **Treating automatic CloudWatch traces as a complete audit trail**: Traces capture execution telemetry, not business intent. For regulatory audit, complement with a structured audit log that captures decision context — use `inline_function` to intercept high-impact actions before execution.
- **Switching models mid-session without validating tool-use compatibility**: Not all providers support the same tool-use schemas. When switching providers mid-session, validate that the new model supports the configured tools — especially for tools with complex or multi-step schemas.
- **Ignoring the `allowed_tools` parameter on InvokeHarness**: Creating a harness with a broad tool set and invoking without restricting via `allowed_tools` violates least privilege. For each invocation type, explicitly define the minimum necessary tool set.

## AgentCore Harness Through the AWS Well-Architected Lens

- **security**: The Token Vault removes raw credentials from the model's execution plane — that is Zero Trust applied to the agent. But you need to go further: use `agentcore_gateway` with IAM policies with `aws:RequestedRegion` and `aws:SourceAccount` conditions to limit the blast radius of a compromised tool. For memory, provision CMKs explicitly and implement key rotation. Add Bedrock Guardrails to the model output pipeline to inspect content before tool-use.
- **reliability**: The harness manages agent loop retries internally, but external tool failures are propagated as events in the stream. Implement idempotency in your downstream tools — the agent may attempt to re-execute a tool after a timeout. For the Gateway, configure circuit breakers at the target level. Mid-session model switching has a failure point: if the new provider is unavailable, the harness should have a configured fallback. Explicitly test provider failover behavior in your runbooks.

> **Architect's Note: What I Would Do Differently:** In any financial deployment of AgentCore Harness, I would not accept the memory defaults — I would provision the resource explicitly with a CMK and TTL aligned to the organization's data retention policy, before creating the harness. The hard-won lesson from years of financial systems in the cloud is that security defaults reasonable for general use rarely satisfy sector-specific regulatory requirements — and fixing that after production data is already in the system is always more expensive than getting it right from the start. I would also instrument the `inline_function` pattern for every high-impact action from day zero, creating a structured audit trail in S3 with Object Lock — CloudWatch traces are excellent for operational debugging, but they do not substitute an immutable audit log for regulatory purposes. Finally, I would treat `allowed_tools` as a security control, not an optional optimization — each invocation type would have an explicitly defined, security-team-reviewed set of permitted tools.

## Verdict: A Solid Architectural Bet with Conditions

AgentCore Harness GA solves a real and concrete problem: the multiplication of agent infrastructure overhead with every new use case. The two-API-call abstraction is well-designed — it collapses isolation, memory, tool routing, identity, and observability into a configurable interface without sacrificing the flexibility to replace any component. For teams building agents in non-regulated environments or in experimentation phases, the harness is a clear choice: you get production-grade in minutes without building the control plane.

For regulated financial environments, the harness is viable — but with non-negotiable conditions. You need to: (1) provision memory explicitly with a CMK and deletion policy aligned to LGPD/GDPR; (2) use `agentcore_gateway` (not `remote_mcp`) for all high-impact tools; (3) implement `inline_function` for human approvals on critical actions; (4) complement CloudWatch traces with an immutable audit log; and (5) treat `allowed_tools` as a security control reviewed by the security team. With those conditions satisfied, the harness is a legitimate architectural accelerator — not a shortcut you will regret in an audit.

## References

- [Amazon Bedrock AgentCore Harness GA Announcement](https://aws.amazon.com/blogs/machine-learning/amazon-bedrock-agentcore-harness-is-now-generally-available-go-from-idea-to-production-grade-agent-in-minutes/)
- [Amazon Bedrock AgentCore — Developer Documentation](https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/what-is-agentcore.html)
- [Amazon Bedrock Guardrails — Agentic AI Workflows API](https://aws.amazon.com/about-aws/whats-new/2026/06/amazon-bedrock-guardrails-api-ai/)
- [Amazon Bedrock AgentCore Optimization Capabilities](https://aws.amazon.com/about-aws/whats-new/2026/06/amazon-bedrock-agentcore-new-optimization-capabilities)
- [AWS Well-Architected Framework — Machine Learning Lens](https://docs.aws.amazon.com/wellarchitected/latest/machine-learning-lens/wellarchitected-machine-learning-lens.html)
- [AWS KMS — Customer Managed Keys](https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html#customer-cmk)
- [Model Context Protocol (MCP) Specification](https://modelcontextprotocol.io/specification)
- [Simon Willison — What is an LLM Agent?](https://simonwillison.net/2025/Jun/)
