Who is Fernando F. Azevedo?

Fernando F. Azevedo is a Senior Solutions Architect at Banco Itaú with 16+ years of experience across AWS, event-driven architecture, DevSecOps, Data Mesh, AI and financial systems.

What technical topics does Fernando work with?

Fernando works with AWS, Kubernetes, Kafka, Data Mesh, Amazon Bedrock, RAG, DevSecOps, observability, financial systems and architecture communication using C4, ADRs and trade-off analysis.

Is Fernando available for professional conversations?

Fernando is currently building at Banco Itaú and is open to thoughtful conversations about architecture, cloud, AI, engineering leadership, community, podcasts and technical collaboration.

The AI Architect Track

Module 4 · Architecture on AWS· Lesson 18/22

Knowledge Bases and managed RAG on AWS

How to build the RAG pipeline from lesson 06 with managed services, including vector search.

5 min read

In lesson 06 you understood what RAG is. In lesson 04 you saw how embeddings turn text into searchable vectors. Now let's close the loop: assembling that entire pipeline on AWS without managing an embedding server, vector index, or orchestrator — using Amazon Bedrock Knowledge Bases as the centerpiece.

What Bedrock Knowledge Bases does for you

A Knowledge Base (KB) in Bedrock is the RAG pipeline from lesson 06 packaged as a managed service. You point it at a data source — an S3 bucket, for example — and the service handles everything: reads the documents, splits them into chunks, generates embeddings with the model you choose (Titan Embeddings, Cohere, etc.), stores the vectors in a vector store, and exposes a semantic retrieval endpoint.

The flow has four internal steps:

Ingestion: the service reads files from the source (PDF, DOCX, HTML, CSV, Confluence, SharePoint…).
Chunking: text is split into pieces — fixed size, by sentence, hierarchical, or semantic. You choose the strategy.
Embedding: each chunk becomes a vector using the configured embedding model.
Indexing: vectors are written to the chosen vector store.

At query time, the KB receives the user's question, generates the question embedding using the same model, runs a similarity search on the index, and returns the most relevant chunks — ready to be injected into the LLM prompt. You write none of that code.

Managed RAG pipeline: from source to agent

Full ingestion and retrieval flow with Bedrock Knowledge Bases. Solid arrows = ingestion path (offline). Dashed arrows = query path (runtime).

📦 Fontes de dados

Amazon S3 · PDF, DOCX, HTML
Confluence / SharePoint · conectores nativos

🟧 AWS — Bedrock Knowledge Base

Ingestão · lê e divide em chunks
Embedding · Titan / Cohere
Knowledge Base · endpoint de recuperação

🗄️ Vector Store

OpenSearch Serverless · padrão recomendado
Aurora PostgreSQL · (pgvector)
Pinecone / Redis · options externas

🤖 Consumidor

Bedrock Agent · ou chamada direta
LLM (Claude, Llama…) · gera resposta final

Where the vectors live: choosing the vector store

Bedrock Knowledge Bases supports multiple vector backends. The choice impacts cost, latency, and operations.

OpenSearch Serverless (AOSS) is the default option and the one AWS integrates most deeply. You don't provision shards or instances — you pay per OCU (OpenSearch Compute Unit) consumed. For most projects with moderate document volume, it's the lowest-friction path. The watch-out: the minimum cost per collection can surprise you in dev/test environments that sit idle.

Aurora PostgreSQL with pgvector makes sense when you already have relational data in the same database. Vector search sits next to your transactional data with no extra hop. The downside is that you manage the cluster, and pgvector has scaling limits compared to dedicated engines.

Pinecone, Redis Enterprise, and MongoDB Atlas are external options supported via connector. Useful if you already have a contract or a team familiar with those services, but they add a dependency outside AWS.

My default recommendation: start with OpenSearch Serverless. It's the lowest-configuration path and integrates well with the rest of Bedrock. Switch to pgvector if AOSS cost doesn't work out or if you need joins with relational data.

In practice: OpenSearch Serverless in dev vs prod

Senior Solutions Architect

In practice, the biggest mistake I see is creating one AOSS collection per environment (dev, staging, prod) without thinking about the minimum cost per collection. AOSS charges even when there are no queries. For dev, consider using a single collection with namespaces separated by index, or switch to local pgvector with Docker while developing. Reserve AOSS for staging and prod, where the cost is justified by scale and native Bedrock integration.

How the agent consumes the Knowledge Base

In lesson 07 you saw tool calling: the model decides when to call an external tool. A Knowledge Base in Bedrock is exactly that for a Bedrock Agent — it appears as a native tool, without you having to write the retrieval function.

When you associate a KB with an agent in Bedrock, the agent automatically gains the ability to query the base during the ReAct loop (lesson 11). The internal orchestrator decides when the user's question requires a KB search, fires the query, receives the chunks, and injects them into context before calling the LLM.

You can also call the KB directly via API — RetrieveAndGenerate or Retrieve separately — without needing a full agent. Retrieve returns raw chunks; RetrieveAndGenerate does the search and calls the LLM, delivering the final answer. Use Retrieve when you want to control the prompt yourself or when you need to post-process chunks before sending them to the model.

This separation between retrieval and generation matters: it lets you evaluate (lesson 09) each stage independently — retrieval quality separate from generation quality.

Managed vs. DIY RAG: when to use each

	Criterion	Bedrock KB (managed)	DIY RAG (LangChain, LlamaIndex…)
Time to working state	Hours (console or IaC)	Days to weeks	—
Chunking/embedding control	Limited to service options	Full — any strategy	—
Operational maintenance	Near zero	High (infra, versions, patches)	—
Portability (multi-cloud)	Low — coupled to AWS	High — runs anywhere	—
Cost at scale	Predictable, but with per-resource minimums	Optimizable, but requires engineering	—

FinOps: the main cost drivers

Embedding at ingestion: you pay per token processed when generating vectors. Large documents or frequent re-ingestions increase this cost — control sync frequency.

Idle vector store: OpenSearch Serverless charges per OCU even without queries. In dev/test, shut down or consolidate collections to avoid unnecessary fixed cost.

Embedding at query time: each user question generates an embedding. At high query volume, query-time embedding cost can exceed ingestion cost.

LLM tokens: retrieved chunks are injected into context and charged as input tokens. Larger chunks = more tokens = higher cost per query.

Re-indexing: changes to chunking strategy require full re-ingestion. Define the strategy before going to production.

Frequently asked questions

Do I need an agent to use Knowledge Bases?

No. You can call the KB directly via API (Retrieve or RetrieveAndGenerate) from any application. The agent is optional — it only adds automatic orchestration and the ReAct loop.

What happens when I update a document in S3?

The KB does not sync automatically by default. You need to trigger a sync operation (manual, scheduled via EventBridge, or via API). During sync, changed documents are re-ingested and old vectors are replaced.

Can I use multiple Knowledge Bases in the same agent?

Yes. Each KB appears as a separate tool to the agent. You can have one KB for technical documentation and another for internal policies, for example. The agent decides which to query based on the question context.

What is the difference between fixed and semantic chunking?

Fixed chunking splits text into blocks of N tokens with configurable overlap — simple and predictable. Semantic chunking uses a model to identify natural meaning boundaries, producing more coherent chunks but at higher processing cost. For most cases, fixed chunking with 20% overlap is a good starting point.

When to use managed — and when not to

✅ Default para projetos AWS — revise se

Bedrock Knowledge Bases is the right choice for most projects already running on AWS that need RAG without building a pipeline from scratch. The gain in delivery speed is real. The flexibility cost is also real: if you need custom chunking, re-rankers, or complex hybrid retrieval logic, you'll hit the managed service ceiling sooner than you expect. My rule: start managed, measure, and only build your own when you have concrete evidence that managed doesn't work — not out of anticipation.

Quiz

Quick check

1. What does a managed Bedrock Knowledge Base give you?

References

Amazon Bedrock Knowledge Bases — Developer Guide Supported vector stores for Knowledge Bases Amazon OpenSearch Serverless — Vector engine Build a RAG-based application with Knowledge Bases (AWS Blog)Chunking strategies for Bedrock Knowledge Bases RetrieveAndGenerate API reference

Previous Next lesson