# Inside AI Agents (1/3): Anatomy and the Reasoning Loop

A technical lesson for developers and architects who hear 'AI agent' every day but want to truly understand what differentiates an agent from a plain LLM, a fixed pipeline, or simple RAG. We cover the full anatomy — model, tools, memory, planner — and the ReAct loop step by step with a concrete example. No hype; real trade-offs.

- URL: https://fernando.moretes.com/studies/agentes-de-ia-por-dentro-1-o-que-e-um-agente

- Markdown: https://fernando.moretes.com/studies/agentes-de-ia-por-dentro-1-o-que-e-um-agente/study.md?lang=en

- Type: Guide / Deep Dive

- Domain: IA / Agentes

- Date: 2026-06-26

- Tags: ai-agents, llm, react-loop, function-calling, architecture, bedrock, foundational, series

- Reading time: 8 min

---

If you are a software developer or architect entering the generative AI world, you have probably heard 'AI agent' dozens of times — in product demos, job postings, startup pitches. The problem is the term gets used for radically different things: from a chatbot with sophisticated if/else logic to systems that genuinely make autonomous decisions and execute actions in the real world. That confusion has a cost: teams build the wrong solution for the right problem, or spend unnecessary complexity where a simple pipeline would do. This is Part 1 of a three-part series. Here you will understand what an AI agent really is — with conceptual precision, no empty jargon. Part 2 covers multi-agent architecture patterns (orchestrator, subagents, composition patterns). Part 3 closes with production on AWS using Bedrock AgentCore. Start here.

## What you will learn

- The precise definition of an AI agent and the five ingredients that compose it
- What differentiates an agent from a plain LLM, a fixed workflow, and simple RAG — with a comparison table
- The internal anatomy: model (brain), tools, short- and long-term memory, and the planner
- The ReAct loop (perception → reasoning → action → observation) explained step by step with a concrete example
- Tool/function calling from the inside: how the model emits a structured intent and the runtime executes it
- When you need an agent — and when you do not (the honest answer)

## Quick Glossary — Terms in This Lesson

- **LLM:** Large Language Model — a large language model trained to predict the next token. Think of it as a function: text in, text out. Stateless, no memory between calls.
- **AI Agent:** A system that uses an LLM as its reasoning engine, has a goal, accesses tools, maintains memory, and autonomously decides next steps until the task is complete.
- **Tool / Function Calling:** Mechanism by which the LLM emits a structured intent (JSON) saying 'I want to call this function with these parameters'. The runtime executes and returns the result.
- **RAG:** Retrieval-Augmented Generation — pattern where you fetch relevant documents from a vector store and inject them into the LLM context before generating the response. Fixed flow, no autonomy.
- **ReAct:** Reasoning + Acting — framework by Yao et al. (2022) that interleaves reasoning steps ('Thought') with action steps ('Action') and result observation ('Observation') in a loop.
- **Planner:** Component (can be the LLM itself) responsible for decomposing a goal into subtasks and deciding execution order.
- **Context Window:** The LLM's 'working memory buffer' — everything it can 'see' in a single call. It is the agent's short-term memory.
- **Runtime / Orchestrator:** The code that manages the agent loop: calls the LLM, interprets the tool call intent, executes the tool, injects the result back into context, and repeats.

## Riding the Elevator: from business to technology

Gregor Hohpe uses the 'Architecture Elevator' metaphor to describe the architect's work: consciously moving between the business floor (the *why*) and the technology floor (the *how*), without losing the connecting thread. We will use that here.

**Business floor:** Companies want to automate tasks that today require a human making sequential decisions — researching information, interpreting results, deciding the next step, executing an action, checking whether it worked, correcting course. A financial analyst who researches market data, compares it with an internal policy, decides whether more data is needed, and then drafts a report is doing exactly this. The value lies in the *autonomy of the decision cycle*, not just text generation.

**Technology floor:** To replicate this in software, you need more than an LLM. You need a system that perceives the state of the world, reasons about it, decides on an action, executes that action via external tools, observes the result, and iterates — all guided by a goal. That is an agent. The distinction is not philosophical; it has direct architectural consequences: latency, cost per task, failure surface, observability, and security all change completely when you move from a single LLM call to an autonomous loop.

## The Precise Definition: five ingredients

An AI agent is a computational system with five ingredients working together:

1. **LLM as reasoning engine** — the model is not the agent; it is the *brain*. Think of it as the CPU of a computer: powerful, but inert without the rest of the system.
2. **Goal** — a task or objective the agent must achieve. Without a goal, the LLM is just a responsive text generator.
3. **Tools** — interfaces to the external world: APIs, databases, web searches, code execution, internal systems. These are the agent's *effectors* — hands and eyes.
4. **Memory** — short-term (the context window of the current call) and long-term (external storage: vector store, relational database, cache). Without long-term memory, the agent forgets everything between sessions.
5. **Autonomy to decide next steps** — this is the ingredient that separates an agent from everything else. The agent decides *at runtime* which tools to call, in what order, how many times, and when to stop. It is not a pre-defined flow graph.

Anthropic summarizes it well in their guide: *'Agents are systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks.'* Dynamic autonomy is the key phrase.

## Agent × Workflow × RAG × Plain LLM — what each one is
| Criterion | Dimension | Plain LLM | Simple RAG | Fixed Workflow / Pipeline | Autonomous Agent |
| --- | --- | --- | --- | --- | --- |
| Who decides the flow? | Nobody — one call | Fixed code (retrieve → generate) | Fixed code (step graph) | The LLM itself at runtime | — |
| External tools? | No | Vector search only | Yes, but pre-defined and ordered | Yes, chosen dynamically | — |
| Memory between steps? | No | Injected context, stateless | State passed between graph nodes | Short-term (context) + long-term (external) | — |
| Number of LLM calls? | 1 | 1 (or few, fixed) | N fixed (defined in graph) | N variable (agent decides) | — |
| Cost / latency | Low and predictable | Low and predictable | Medium and predictable | High and variable (loop risk) | — |
| When to use? | Simple task, direct answer | Q&A over internal documents | Known process, fixed steps | Open-ended task, steps unknown a priori | — |

## Diagram 1: AI Agent Anatomy

The five internal components of an agent and how they relate. The LLM is the reasoning center; the runtime/orchestrator is the control loop; tools are the outputs to the world; memory is the persisted state.

### 🧠 Núcleo de Raciocínio / Reasoning Core

- LLM (Modelo / Brain) (ai)
- Planner (decompõe o objetivo) (ai)

### 🔄 Runtime / Orchestrator

- Agent Runtime (loop de controle) (compute)

### 🧰 Ferramentas / Tools

- Web Search API (external)
- Database Query (data)
- Code Executor (compute)
- External API / System (external)

### 💾 Memória / Memory

- Memória Curto Prazo (Context Window) (storage)
- Memória Longo Prazo (Vector Store / DB) (storage)

### 👤 Entrada / Input

- Usuário / Sistema (objetivo / tarefa) (user)

### Flows

- user -> runtime: goal
- runtime -> llm: prompt + context
- llm -> planner: reasoning / decomposition
- planner -> runtime: next step
- runtime -> tool_search: tool call
- runtime -> tool_db: tool call
- runtime -> tool_code: tool call
- runtime -> tool_api: tool call
- tool_search -> runtime: result
- tool_db -> runtime: result
- tool_code -> runtime: result
- tool_api -> runtime: result
- runtime -> mem_short: read/write context
- runtime -> mem_long: persist / retrieve
- runtime -> user: final answer

## Tool Calling from the Inside: the heart of the agent

If you have worked with REST APIs, you will understand function calling immediately with this analogy: the LLM is a client that cannot make HTTP calls directly. Instead, it writes a *call request* in a structured format (JSON), and the runtime acts as a proxy that executes the real call and returns the result.

In practice, it works like this. You register tools in the system with a JSON schema — name, description, parameters. The LLM receives that schema alongside the user prompt. When the model decides it needs a tool, instead of generating free text, it emits a special JSON object:

```json
{
  "tool_use": "search_web",
  "parameters": { "query": "Selic rate today" }
}
```

The runtime intercepts this, executes the `search_web` function with the provided parameter, gets the result (say, `"13.25% p.a."`) and injects it back into context as an observation message. The LLM then continues reasoning with that new information available.

Why is this the heart of the agent? Because it is the mechanism that transforms the LLM from a passive text generator into an actor that affects the world. Without tool calling, the agent does not act — it only plans out loud. With tool calling, each loop iteration can change external state: create a file, call an API, write to a database. This is also where the risk lives: a poorly specified tool or one without guardrails can cause irreversible side effects. We will return to this in Part 2.

## Diagram 2: The ReAct Loop — Perception → Reasoning → Action → Observation

Concrete example: agent receives the task 'What is the impact of Selic rate changes over the last 6 months on the average CDB price of large banks?' — a task requiring multiple steps and sources.

### 👤 Entrada / Input

- Tarefa do Usuário (objetivo complexo) (user)

### 🔄 Loop ReAct (iterações do agente)

- Thought 1 'Preciso da série histórica da Selic' (ai)
- Action 1 search_web('Selic histórico 6 meses') (compute)
- Observation 1 [dados da Selic retornados] (data)
- Thought 2 'Agora preciso de dados de CDB' (ai)
- Action 2 query_db('CDB grandes bancos 6m') (compute)
- Observation 2 [taxas CDB retornadas] (data)
- Thought 3 'Tenho dados suficientes — calcular correlação' (ai)
- Action 3 execute_code('correlação Selic × CDB') (compute)
- Observation 3 [resultado: correlação 0.87] (data)
- Final Answer (relatório gerado) (ai)

### 🧰 Ferramentas Externas / External Tools

- Web Search API (external)
- Market Data Database (data)
- Code Executor (compute)

### Flows

- goal -> t1: goal perception
- t1 -> a1: decides action
- a1 -> ext_web: tool call
- ext_web -> o1: result
- o1 -> t2: new context
- t2 -> a2: decides action
- a2 -> ext_db: tool call
- ext_db -> o2: result
- o2 -> t3: new context
- t3 -> a3: decides action
- a3 -> ext_code: tool call
- ext_code -> o3: result
- o3 -> final: sufficient → answer

## The ReAct Loop: Thought, Action, Observation

The Yao et al. (2022) paper formalized something that seems obvious in hindsight but was an important conceptual leap: interleaving explicit reasoning with real actions drastically improves agent quality and traceability.

The loop has four phases that repeat:

**Perception:** The agent receives the current state — the original task, the history of previous actions, and accumulated observations. All of this is in the context window. It is like opening your IDE with the full diff of everything that has happened so far.

**Thought:** The LLM generates a natural-language reasoning step: *'I have the Selic data, but I do not yet have the CDB data. I need to query the market database.'* This step executes nothing — it is the model thinking out loud. This is crucial for debuggability: you can read the agent's reasoning like a log.

**Action:** Based on the reasoning, the LLM emits a structured tool call. The runtime executes it.

**Observation:** The tool result is injected back into context as a system message. The loop restarts with this new context.

The agent stops when it decides it has enough information to answer the original goal, or when it hits an iteration limit (your circuit breaker). Without that limit, you have an infinite loop — and an infinite API bill.

## When you need an agent — and when you do not

This is the most important question in this lesson, and the honest answer is: **in most real cases, you do not need an autonomous agent.**

Anthropic is direct in their engineering guide: *'When building applications with LLMs, we recommend finding the simplest solution possible, and only adding complexity when needed.'* This is not corporate modesty — it is solid architecture.

**Use workflow + RAG when:** the steps are known and fixed, the process can be modeled as a deterministic graph, cost and latency predictability is critical, and the domain has low error tolerance (regulated finance, healthcare, legal). A customer onboarding pipeline, a Q&A system over regulatory documents, a report generation with a fixed template — all of these are workflows, not agents.

**Use an agent when:** the task is open-ended and the required steps cannot be determined a priori, the problem requires adaptive reasoning based on intermediate results, and you accept cost and latency variability in exchange for flexibility. Exploratory research, debugging assistants, process automation where the sequence depends on the data found — these are genuine cases.

**The warning signal:** if you can draw the complete flow in a sequence diagram before writing code, it is probably a workflow. If the diagram has a question mark in the middle — *'depends on what the LLM finds'* — then you have an agent candidate. Always start with the simplest approach.

## Where to start — architect's mental checklist

- ✅ Before calling it an 'agent': are the task steps known a priori? If yes → workflow.
- ✅ Does the task only require information retrieval + text generation? If yes → simple RAG.
- ✅ If it is an agent: define the goal precisely. Vague goal = infinite loop.
- ✅ List required tools with precise schemas. Poorly described tool = parameter hallucination.
- ✅ Define a maximum iteration limit (circuit breaker). Without it, the agent may iterate indefinitely.
- ✅ Separate short-term memory (context window) from long-term memory (external storage) from the start.

> **Architect's Perspective — Fernando Azevedo:** After 16 years building financial systems where a bug can mean real money loss or regulatory violation, my instinctive reaction to 'autonomous agent' is caution — not skepticism, caution. The ReAct loop is genuinely powerful, but autonomy has a cost: you trade predictability for flexibility, and in regulated systems that trade-off needs to be explicitly justified. What convinces me to use an agent is not the technology itself, but the nature of the problem: if the decision flow is genuinely unknown a priori and depends on intermediate data, the agent is the right tool. If you can draw the flow before writing code, use a pipeline. Most cases I see labeled as 'agent' in practice are workflows with RAG — and there is nothing wrong with that. Simplicity is an architecture feature. That said, for exploratory tasks, unstructured data research, and complex process automation where the sequence depends on what you find along the way, the mental model in this lesson will serve as a solid foundation for everything that follows — including the production decisions on AWS we cover in Part 3.

## Verdict — What you should take from this lesson

An AI agent is not an LLM with a better prompt. It is a system with five precise ingredients — model, goal, tools, memory, and autonomy — where the LLM acts as a reasoning engine in an iterative loop (ReAct) that perceives, reasons, acts, and observes until the task is complete. The central mechanism that makes this possible is tool/function calling: the model emits structured JSON intents, the runtime executes them, and the result feeds the next reasoning cycle. The distinction between agent, workflow, and RAG is not semantic — it determines cost, predictability, failure surface, and observability requirements. Use the agent when autonomy is genuinely necessary; use the pipeline when the flow is known. In Part 2, we will go up a floor in the elevator and look at the architecture patterns that emerge when you compose multiple agents — orchestrator, subagents, parallelism, and the trade-offs of each approach. In Part 3, we come down to the factory floor: real production on AWS with Bedrock AgentCore.

## References

- [Anthropic — Building effective agents](https://www.anthropic.com/engineering/building-effective-agents)
- [Yao et al. — ReAct: Synergizing Reasoning and Acting in Language Models (arXiv 2210.03629)](https://arxiv.org/abs/2210.03629)
- [AWS — What are AI agents?](https://aws.amazon.com/what-is/ai-agents/)
- [Gregor Hohpe — The Architecture Elevator (book)](https://architectelevator.com/)

## Case sources

- [Anthropic — Building effective agents](https://www.anthropic.com/engineering/building-effective-agents)
- [Yao et al. — ReAct: Synergizing Reasoning and Acting in LLMs](https://arxiv.org/abs/2210.03629)
- [AWS — What are AI agents?](https://aws.amazon.com/what-is/ai-agents/)