Who is Fernando F. Azevedo?

Fernando F. Azevedo is a Senior Solutions Architect at Banco Itaú with 16+ years of experience across AWS, event-driven architecture, DevSecOps, Data Mesh, AI and financial systems.

What technical topics does Fernando work with?

Fernando works with AWS, Kubernetes, Kafka, Data Mesh, Amazon Bedrock, RAG, DevSecOps, observability, financial systems and architecture communication using C4, ADRs and trade-off analysis.

Is Fernando available for professional conversations?

Fernando is currently building at Banco Itaú and is open to thoughtful conversations about architecture, cloud, AI, engineering leadership, community, podcasts and technical collaboration.

Guide / Deep DiveIA / Agentes

Inside AI Agents (2/3): Architecture Pattern Catalog — from ReAct to Multi-Agent

Jun 26, 2026 8 min AI-assisted

Listen to study

generated on play

Generated only on first play

On demand

0:000:00

Speed

The MP3 is saved to S3 after the first play.

The second lesson in the series maps the full catalog of AI agent architecture patterns: from single-agent loops (ReAct, Reflexion, Plan-and-Execute) to multi-agent orchestration, covering memory as an architecture decision, guardrails, and human-in-the-loop. The goal is to give the architect a precise vocabulary to choose — and justify — the right pattern for each problem, without falling into classic anti-patterns.

In Part 1 of this series you understood what an AI agent is: a loop that perceives, reasons, acts, and observes results — different from a simple prompt pipeline. Now comes the question that really matters for system designers: which pattern to use? ReAct? Plan-and-Execute? Multi-agent with a supervisor? The wrong answer costs money (tokens are expensive), latency (chained loops add up in seconds), and reliability (every hop is a failure point). This guide is the catalog I wish I had when I started designing agents for production: each pattern explained from scratch, with when to use it, when to avoid it, and the anti-patterns that show up every time.

What you will learn

The 5 single-agent patterns (ReAct, Reflexion, Plan-and-Execute, Tool Use, Agentic RAG) and when each one fits

Memory as an architecture decision: context window, session, long-term — and why poorly designed memory becomes cost and hallucination

The 4 multi-agent topologies (supervisor, hierarchical, swarm, specialist routing) and the real criterion for choosing

Guardrails and security as an architecture layer — not an afterthought

Human-in-the-loop: when to require human approval and how to model checkpoints without blocking the flow

Classic anti-patterns: premature multi-agent, tool sprawl, memory without TTL, infinite loop

Quick Glossary — Terms that appear in this lesson

LLM: Large Language Model — the language model that does the agent's core reasoning (e.g. Claude, GPT-4, Titan).
Tool / Function: External function the agent can call: REST API, SQL query, vector search, calculator, etc.
Context Window: Token limit the LLM can 'see' at once. Everything outside the window is invisible to the model — like process RAM.
RAG: Retrieval-Augmented Generation — fetching relevant documents and injecting them into the prompt before generating a response.
Guardrail: Validation layer that filters agent inputs/outputs — analogous to a WAF for LLMs.
Prompt Injection: Attack where external data (e.g. email content) contains malicious instructions that hijack the agent's behavior.
Orchestrator / Supervisor: Agent or process that breaks down tasks and delegates to specialized sub-agents.
TTL (Time-to-Live): Expiry time for data in cache/memory. Without TTL, agent memory grows indefinitely.

The Fundamental Loop: why every pattern starts here

If you read Part 1, you know that an agent is essentially a loop: the LLM receives an observation, reasons, decides on an action, executes it, observes the result, and repeats. What differentiates architectural patterns is not the loop itself — it is how the reasoning is structured inside it and how many agents participate.

Think like a software developer. You have written a while with a stop condition. An agent is exactly that, but the stop condition is decided by the LLM ('is the task complete?'), and the loop body can call external tools. The engineering problem is: how do you structure the reasoning inside the loop so that it is reliable, auditable, and does not enter an infinite loop?

ReAct (Reasoning + Acting, Yao et al., 2022) was the first pattern to formalize this: the model explicitly alternates between a Thought step (natural language reasoning), an Action step (tool call), and an Observation step (tool result). This explicit alternation is powerful because it makes reasoning auditable — you can read the log and understand why the agent made each decision. It is the equivalent of writing code with inline comments instead of a monolithic block.

Diagram 1 — Single-Agent ReAct with Tools

A single LLM agent in a ReAct loop (Thought → Action → Observation). Tools are synchronous calls; the guardrail inspects input and output. The loop ends when the LLM emits 'Final Answer' or the step budget is exhausted.

👤 User / Client

User · request

🛡️ Security Layer

Input Guardrail · prompt injection / PII filter
Output Guardrail · toxicity / data leak filter

🤖 Agent Core (ReAct Loop)

LLM · Claude / GPT-4 / Titan
Thought · reasoning step
Action · tool selection + args
Observation · tool result injected

🔧 Tools

Vector Search · Agentic RAG
External API · REST / GraphQL
Calculator · deterministic fn

💾 Memory

Context Window · in-prompt (ephemeral)
Session Memory · short-term store

Beyond ReAct: Reflexion, Plan-and-Execute, and Agentic RAG

Reflexion (Shinn et al., 2023) adds a second inner loop of self-critique: after each attempt, the agent generates a reflection on what went wrong and stores that reflection in session memory before trying again. It is like a developer who writes a test, sees it fail, notes the diagnosis, and then fixes it — instead of simply trying again in the dark. The cost is clear: more tokens per attempt. Use it when the task has a verifiable correctness criterion (e.g. code that must compile, SQL that must return a non-empty result).

Plan-and-Execute separates reasoning into two distinct phases: a Planner LLM generates a step-by-step plan (think of a Makefile), and an Executor LLM executes each step independently. The advantage is that the plan can be inspected and approved by a human before execution — excellent for high-risk flows. The disadvantage is rigidity: if the environment changes during execution, the plan may become stale.

Agentic RAG is the upgrade from static RAG that every architect should know. In traditional RAG, you always fetch documents before generating. In Agentic RAG, the agent decides when to fetch, formulates the search query as part of its reasoning, and can perform multiple iterative searches ('I did not find it, I will reformulate the query'). This reduces noise in the context — you do not inject irrelevant documents — but requires the agent to know when it does not know something, which is a model calibration problem.

Single-Agent Patterns — When to use each one

	Pattern	Code analogy	Best for	Main cost	Avoid when
ReAct	while loop with log	Exploratory tasks with tools	Tokens per iteration	Task has a direct answer without tools	—
Reflexion	TDD with self-critique	Code generation, SQL, verifiable logic	2-3x more tokens per attempt	No clear correctness criterion	—
Plan-and-Execute	Makefile + executor	Long flows with human approval	Planning latency + rigidity	Environment changes during execution	—
Tool Use / Function Calling	SDK with typed methods	Integration with external APIs	Tool I/O latency	More than 15-20 tools (tool sprawl)	—
Agentic RAG	Lazy loading of knowledge	Large, heterogeneous knowledge bases	Model calibration to know when to search	Small, stable knowledge base (static RAG suffices)	—

Memory as Architecture: the problem nobody models correctly

Memory in agents is analogous to storage in microservices: you have RAM (fast, expensive, volatile), local disk (cheaper, session-persistent), and distributed database (slow, cheap, shared). Each layer has a different trade-off.

The context window is the agent's RAM: everything in the active prompt. It is the fastest memory, but has linear token cost — every extra token you inject into the context increases inference cost and latency. The classic mistake is to dump everything into the context window ('I will include the full conversation history') and discover that the API bill tripled.

Session memory (short-term) is the equivalent of a Redis cache per session: you persist the current conversation history outside the prompt and inject only a summary or the last N turns. This solves the cost problem, but requires a summarization strategy — and LLM-based summarization has a cost too.

Long-term memory — vector, episodic, semantic — is the distributed database. You store facts, user preferences, past episodes in a vector store (e.g. OpenSearch, pgvector) and retrieve by semantic similarity when relevant. The risk here is twofold: memory without TTL grows indefinitely (storage cost and search latency), and stale memory generates hallucinations — the agent 'remembers' a fact that has changed.

Diagram 2 — Agent Memory Layers

Three memory layers with increasing cost and latency. The agent chooses which layer to consult/write at each step of the ReAct loop. The 'summarization' arrow shows that the context window can be compressed into session memory.

🤖 Agent LLM

Agent LLM · ReAct loop

⚡ Layer 1 — Context Window (RAM)

Context Window · ~200k tokens max · ephemeral, costly

🗂️ Layer 2 — Session Memory (Cache)

Session Store · Redis / DynamoDB · TTL = session lifetime
Summarizer · LLM compression

🗄️ Layer 3 — Long-Term Memory (DB)

Vector Store · OpenSearch / pgvector · semantic retrieval
Episodic Store · structured facts · user preferences
TTL Policy · stale-data eviction

Multi-Agent: when it truly helps and when it only adds complexity

The most important question about multi-agent is not 'which topology to use' — it is 'do I really need multi-agent?' The honest answer is: in most cases, no. A well-designed single agent with the right tools solves 80% of the problems people try to solve with complex multi-agent architectures.

Multi-agent makes sense in three real scenarios: (1) genuine parallelism — independent sub-tasks that can run simultaneously (e.g. analyzing 10 documents in parallel); (2) domain specialization — a sub-agent with a system prompt and tools optimized for a specific domain (e.g. a compliance agent vs. a financial analysis agent); (3) context isolation — you do not want the context of one sub-task to contaminate the reasoning of another.

The four main topologies are: Supervisor/Orchestrator (a central agent delegates to sub-agents and consolidates results — like a tech lead distributing tasks); Hierarchical (supervisor of supervisors — for very large problems); Swarm/Peers (agents without hierarchy that communicate via messages — more resilient, harder to debug); and Specialist Routing (a router LLM classifies the task and sends it to the correct specialist agent — analogous to an API Gateway with semantic routing).

The decision criterion I use in production: if you cannot explain in one sentence why each agent exists separately, you probably do not need multi-agent.

Diagram 3 — Supervisor → Specialist Sub-Agents Topology

The orchestrator receives the user task, decomposes it into sub-tasks, delegates to specialized agents (with their own tools and memory), and consolidates results. Guardrails operate at each boundary. Human-in-the-loop is a checkpoint before high-risk action.

👤 User / Client

User · request

🛡️ Entry Guardrail

Entry Guardrail · injection / scope check

🧠 Orchestrator Agent

Supervisor LLM · task decomposition · + result consolidation
Task Router · which sub-agent?

🔬 Specialist Sub-Agents

Compliance Agent · regulatory tools · narrow context
Finance Agent · data / calc tools · narrow context
Document Agent · Agentic RAG · vector store

🧑‍💼 Human-in-the-Loop

Human Checkpoint · approve / reject · high-risk action

🛡️ Exit Guardrail

Exit Guardrail · output validation · data leak check

Guardrails, Security, and Human-in-the-Loop: architecture, not afterthought

The most common mistake I see in teams coming to AI agents is treating security as a layer added afterwards. That does not work. Guardrails need to be designed as part of the flow from the start, because they affect latency, cost, and tool design.

Prompt injection is the SQL injection of agents: external data (an email, a document, an API response) can contain instructions that hijack the agent's behavior. The defense is layered: (1) clearly separate data from instructions in the prompt (use explicit delimiters); (2) validate in the input guardrail whether data content contains instruction patterns; (3) apply the principle of least privilege to tools — an agent that only needs to read documents should not have a database write tool.

Tool sprawl is the anti-pattern of giving the agent 30 tools 'because it might need them'. The LLM must choose the right tool from all available ones — the more tools, the higher the probability of wrong selection and the larger the system prompt. Practical rule: if an agent has more than 15 tools, consider splitting into specialized sub-agents.

Human-in-the-loop is not a sign of architectural weakness — it is a business requirement in any system that executes irreversible actions (financial transfers, mass email sending, data deletion). The correct pattern is to model explicit checkpoints in the Plan-and-Execute flow: the plan is generated, a human approves, execution begins. This is also what Amazon Bedrock Agents supports natively with 'human approval steps'.

Classic Agent Anti-Patterns — Symptom, Cause, and Remedy

	Anti-Pattern	Symptom	Root cause	Remedy
Premature multi-agent	High latency, impossible debugging	Complexity before real need	Start single-agent; add agents only with clear justification	—
Tool Sprawl	Agent picks wrong tool, random errors	More than 15-20 tools per agent	Split into specialized sub-agents with minimal tools	—
Memory without TTL	Growing cost, hallucinations from stale facts	No expiration policy on vector store	TTL per data type; re-index facts that change	—
Infinite loop	Timeout, exploding cost, no response	No step budget	Set max_iterations; explicit fallback to 'I do not know'	—
Polluted context	Inconsistent responses, high cost	Full history always in prompt	Summarization + session memory; inject only what is relevant	—

Where to start — Mental checklist for the architect

✅ Start with the simplest pattern that solves the problem: a ReAct agent with 3-5 tools solves most cases.

✅ Define the step budget (max_iterations) before going to production — without it, you are guaranteed an infinite loop.

✅ Model memory layers explicitly: what goes in the context window, what goes in session, what goes in the vector store — and what the TTL is for each.

✅ Input and output guardrails are mandatory in production — even if simple at first. Add prompt injection detection from day 1.

✅ Before adding a second agent, write in one sentence why it exists separately. If you cannot, do not add it.

✅ Identify the irreversible actions in your system and add human-in-the-loop checkpoints for each of them.

Architect's Perspective — For those making this transition

Senior Solutions Architect

If you come from distributed systems, the biggest trap when designing agents is the instinct to add complexity to solve reliability problems. In microservices, when something fails, you add retry, circuit breaker, saga. In agents, the answer is almost always: simplify the loop, reduce the tools, improve the prompt. Multi-agent is powerful, but it is the equivalent of microservices — you pay the coordination cost in latency, tokens, and operational complexity. I only add a second agent when I can measure the concrete benefit: latency reduction through parallelism, or measurable quality improvement through specialization. What convinced me to take guardrails seriously was not theory — it was watching an agent in staging get hijacked by a prompt injection hidden in the body of an email it was processing. Part 3 of this series will show how Bedrock AgentCore handles these problems with managed infrastructure, but the principles in this lesson apply to any stack.

Verdict — What to take away

The pattern catalog in this lesson is not a list of equivalent options — it is a staircase of complexity. ReAct is step 1: it solves most problems, is auditable, and cheap to operate. Reflexion and Plan-and-Execute are steps 2 and 3: they add self-correction capability and human approval, but at the cost of tokens and latency. Multi-agent is step 4: only climb there when the previous steps fail for measurable reasons. Memory and guardrails are not steps — they are foundations that must be present from step 1. Part 3 of this series closes the loop by showing how Amazon Bedrock AgentCore implements these patterns in managed infrastructure, with native support for memory, guardrails, and human-in-the-loop.

References

Anthropic — Building effective agents AWS — Multi-agent orchestration on Amazon Bedrock Yao et al. — ReAct: Synergizing Reasoning and Acting in Language Models (arXiv 2210.03629)Shinn et al. — Reflexion: Language Agents with Verbal Reinforcement Learning (arXiv 2303.11366)Gregor Hohpe — The Architecture Elevator (book)AWS — Amazon Bedrock Agents documentation

#ai-agents#ReAct#multi-agent#memory#guardrails#orchestration#LLM#architecture-patterns

Case sources

Anthropic — Building effective agents AWS — Multi-agent orchestration on Amazon Bedrock Yao et al. — ReAct Shinn et al. — Reflexion

Liked this study? Get the next one.

Post-mortems, ADRs and architecture deep dives in your inbox — the way an architect reads them.

No spam · unsubscribe anytime

Written with AI assistance from the public case and my architect's reading.

Ask Fernando about this

Get a focused answer about this study from my AI assistant, grounded in my work.

Join the conversation

Verify your email to join in — you'll also get the newsletter. No password.

Guide / Deep DiveIA / Agentes

Inside AI Agents (2/3): Architecture Pattern Catalog — from ReAct to Multi-Agent

Jun 26, 2026 8 min AI-assisted

Listen to study

generated on play

Generated only on first play

On demand

0:000:00

Speed

The MP3 is saved to S3 after the first play.

What you will learn

The 5 single-agent patterns (ReAct, Reflexion, Plan-and-Execute, Tool Use, Agentic RAG) and when each one fits

Memory as an architecture decision: context window, session, long-term — and why poorly designed memory becomes cost and hallucination

The 4 multi-agent topologies (supervisor, hierarchical, swarm, specialist routing) and the real criterion for choosing

Guardrails and security as an architecture layer — not an afterthought

Human-in-the-loop: when to require human approval and how to model checkpoints without blocking the flow

Classic anti-patterns: premature multi-agent, tool sprawl, memory without TTL, infinite loop

Quick Glossary — Terms that appear in this lesson

LLM: Large Language Model — the language model that does the agent's core reasoning (e.g. Claude, GPT-4, Titan).
Tool / Function: External function the agent can call: REST API, SQL query, vector search, calculator, etc.
Context Window: Token limit the LLM can 'see' at once. Everything outside the window is invisible to the model — like process RAM.
RAG: Retrieval-Augmented Generation — fetching relevant documents and injecting them into the prompt before generating a response.
Guardrail: Validation layer that filters agent inputs/outputs — analogous to a WAF for LLMs.
Prompt Injection: Attack where external data (e.g. email content) contains malicious instructions that hijack the agent's behavior.
Orchestrator / Supervisor: Agent or process that breaks down tasks and delegates to specialized sub-agents.
TTL (Time-to-Live): Expiry time for data in cache/memory. Without TTL, agent memory grows indefinitely.

The Fundamental Loop: why every pattern starts here

Diagram 1 — Single-Agent ReAct with Tools

👤 User / Client

User · request

🛡️ Security Layer

Input Guardrail · prompt injection / PII filter
Output Guardrail · toxicity / data leak filter

🤖 Agent Core (ReAct Loop)

LLM · Claude / GPT-4 / Titan
Thought · reasoning step
Action · tool selection + args
Observation · tool result injected

🔧 Tools

Vector Search · Agentic RAG
External API · REST / GraphQL
Calculator · deterministic fn

💾 Memory

Context Window · in-prompt (ephemeral)
Session Memory · short-term store

Beyond ReAct: Reflexion, Plan-and-Execute, and Agentic RAG

Single-Agent Patterns — When to use each one

	Pattern	Code analogy	Best for	Main cost	Avoid when
ReAct	while loop with log	Exploratory tasks with tools	Tokens per iteration	Task has a direct answer without tools	—
Reflexion	TDD with self-critique	Code generation, SQL, verifiable logic	2-3x more tokens per attempt	No clear correctness criterion	—
Plan-and-Execute	Makefile + executor	Long flows with human approval	Planning latency + rigidity	Environment changes during execution	—
Tool Use / Function Calling	SDK with typed methods	Integration with external APIs	Tool I/O latency	More than 15-20 tools (tool sprawl)	—
Agentic RAG	Lazy loading of knowledge	Large, heterogeneous knowledge bases	Model calibration to know when to search	Small, stable knowledge base (static RAG suffices)	—

Memory as Architecture: the problem nobody models correctly

Diagram 2 — Agent Memory Layers

🤖 Agent LLM

Agent LLM · ReAct loop

⚡ Layer 1 — Context Window (RAM)

Context Window · ~200k tokens max · ephemeral, costly

🗂️ Layer 2 — Session Memory (Cache)

Session Store · Redis / DynamoDB · TTL = session lifetime
Summarizer · LLM compression

🗄️ Layer 3 — Long-Term Memory (DB)

Vector Store · OpenSearch / pgvector · semantic retrieval
Episodic Store · structured facts · user preferences
TTL Policy · stale-data eviction

Multi-Agent: when it truly helps and when it only adds complexity

The decision criterion I use in production: if you cannot explain in one sentence why each agent exists separately, you probably do not need multi-agent.

Diagram 3 — Supervisor → Specialist Sub-Agents Topology

👤 User / Client

User · request

🛡️ Entry Guardrail

Entry Guardrail · injection / scope check

🧠 Orchestrator Agent

Supervisor LLM · task decomposition · + result consolidation
Task Router · which sub-agent?

🔬 Specialist Sub-Agents

Compliance Agent · regulatory tools · narrow context
Finance Agent · data / calc tools · narrow context
Document Agent · Agentic RAG · vector store

🧑‍💼 Human-in-the-Loop

Human Checkpoint · approve / reject · high-risk action

🛡️ Exit Guardrail

Exit Guardrail · output validation · data leak check

Guardrails, Security, and Human-in-the-Loop: architecture, not afterthought

Classic Agent Anti-Patterns — Symptom, Cause, and Remedy

	Anti-Pattern	Symptom	Root cause	Remedy
Premature multi-agent	High latency, impossible debugging	Complexity before real need	Start single-agent; add agents only with clear justification	—
Tool Sprawl	Agent picks wrong tool, random errors	More than 15-20 tools per agent	Split into specialized sub-agents with minimal tools	—
Memory without TTL	Growing cost, hallucinations from stale facts	No expiration policy on vector store	TTL per data type; re-index facts that change	—
Infinite loop	Timeout, exploding cost, no response	No step budget	Set max_iterations; explicit fallback to 'I do not know'	—
Polluted context	Inconsistent responses, high cost	Full history always in prompt	Summarization + session memory; inject only what is relevant	—

Where to start — Mental checklist for the architect

✅ Start with the simplest pattern that solves the problem: a ReAct agent with 3-5 tools solves most cases.

✅ Define the step budget (max_iterations) before going to production — without it, you are guaranteed an infinite loop.

✅ Model memory layers explicitly: what goes in the context window, what goes in session, what goes in the vector store — and what the TTL is for each.

✅ Input and output guardrails are mandatory in production — even if simple at first. Add prompt injection detection from day 1.

✅ Before adding a second agent, write in one sentence why it exists separately. If you cannot, do not add it.

✅ Identify the irreversible actions in your system and add human-in-the-loop checkpoints for each of them.

Architect's Perspective — For those making this transition

Senior Solutions Architect

Verdict — What to take away

References

#ai-agents#ReAct#multi-agent#memory#guardrails#orchestration#LLM#architecture-patterns

Case sources

Anthropic — Building effective agents AWS — Multi-agent orchestration on Amazon Bedrock Yao et al. — ReAct Shinn et al. — Reflexion

Liked this study? Get the next one.

Post-mortems, ADRs and architecture deep dives in your inbox — the way an architect reads them.

No spam · unsubscribe anytime

Written with AI assistance from the public case and my architect's reading.

Ask Fernando about this

Get a focused answer about this study from my AI assistant, grounded in my work.

Join the conversation

Verify your email to join in — you'll also get the newsletter. No password.