Knowledge Bases and managed RAG on AWS
How to build the RAG pipeline from lesson 06 with managed services, including vector search.
5 min read
In lesson 06 you understood what RAG is. In lesson 04 you saw how embeddings turn text into searchable vectors. Now let's close the loop: assembling that entire pipeline on AWS without managing an embedding server, vector index, or orchestrator — using Amazon Bedrock Knowledge Bases as the centerpiece.
What Bedrock Knowledge Bases does for you
A Knowledge Base (KB) in Bedrock is the RAG pipeline from lesson 06 packaged as a managed service. You point it at a data source — an S3 bucket, for example — and the service handles everything: reads the documents, splits them into chunks, generates embeddings with the model you choose (Titan Embeddings, Cohere, etc.), stores the vectors in a vector store, and exposes a semantic retrieval endpoint.
The flow has four internal steps:
- Ingestion: the service reads files from the source (PDF, DOCX, HTML, CSV, Confluence, SharePoint…).
- Chunking: text is split into pieces — fixed size, by sentence, hierarchical, or semantic. You choose the strategy.
- Embedding: each chunk becomes a vector using the configured embedding model.
- Indexing: vectors are written to the chosen vector store.
At query time, the KB receives the user's question, generates the question embedding using the same model, runs a similarity search on the index, and returns the most relevant chunks — ready to be injected into the LLM prompt. You write none of that code.
Managed RAG pipeline: from source to agent
Full ingestion and retrieval flow with Bedrock Knowledge Bases. Solid arrows = ingestion path (offline). Dashed arrows = query path (runtime).
- Amazon S3 · PDF, DOCX, HTML
- Confluence / SharePoint · conectores nativos
- Ingestão · lê e divide em chunks
- Embedding · Titan / Cohere
- Knowledge Base · endpoint de recuperação
- OpenSearch Serverless · padrão recomendado
- Aurora PostgreSQL · (pgvector)
- Pinecone / Redis · options externas
- Bedrock Agent · ou chamada direta
- LLM (Claude, Llama…) · gera resposta final
Where the vectors live: choosing the vector store
Bedrock Knowledge Bases supports multiple vector backends. The choice impacts cost, latency, and operations.
OpenSearch Serverless (AOSS) is the default option and the one AWS integrates most deeply. You don't provision shards or instances — you pay per OCU (OpenSearch Compute Unit) consumed. For most projects with moderate document volume, it's the lowest-friction path. The watch-out: the minimum cost per collection can surprise you in dev/test environments that sit idle.
Aurora PostgreSQL with pgvector makes sense when you already have relational data in the same database. Vector search sits next to your transactional data with no extra hop. The downside is that you manage the cluster, and pgvector has scaling limits compared to dedicated engines.
Pinecone, Redis Enterprise, and MongoDB Atlas are external options supported via connector. Useful if you already have a contract or a team familiar with those services, but they add a dependency outside AWS.
My default recommendation: start with OpenSearch Serverless. It's the lowest-configuration path and integrates well with the rest of Bedrock. Switch to pgvector if AOSS cost doesn't work out or if you need joins with relational data.
In practice, the biggest mistake I see is creating one AOSS collection per environment (dev, staging, prod) without thinking about the minimum cost per collection. AOSS charges even when there are no queries. For dev, consider using a single collection with namespaces separated by index, or switch to local pgvector with Docker while developing. Reserve AOSS for staging and prod, where the cost is justified by scale and native Bedrock integration.
How the agent consumes the Knowledge Base
In lesson 07 you saw tool calling: the model decides when to call an external tool. A Knowledge Base in Bedrock is exactly that for a Bedrock Agent — it appears as a native tool, without you having to write the retrieval function.
When you associate a KB with an agent in Bedrock, the agent automatically gains the ability to query the base during the ReAct loop (lesson 11). The internal orchestrator decides when the user's question requires a KB search, fires the query, receives the chunks, and injects them into context before calling the LLM.
You can also call the KB directly via API — RetrieveAndGenerate or Retrieve separately — without needing a full agent. Retrieve returns raw chunks; RetrieveAndGenerate does the search and calls the LLM, delivering the final answer. Use Retrieve when you want to control the prompt yourself or when you need to post-process chunks before sending them to the model.
This separation between retrieval and generation matters: it lets you evaluate (lesson 09) each stage independently — retrieval quality separate from generation quality.
Managed vs. DIY RAG: when to use each
| Criterion | Bedrock KB (managed) | DIY RAG (LangChain, LlamaIndex…) | |
|---|---|---|---|
| Time to working state | Hours (console or IaC) | Days to weeks | — |
| Chunking/embedding control | Limited to service options | Full — any strategy | — |
| Operational maintenance | Near zero | High (infra, versions, patches) | — |
| Portability (multi-cloud) | Low — coupled to AWS | High — runs anywhere | — |
| Cost at scale | Predictable, but with per-resource minimums | Optimizable, but requires engineering | — |
FinOps: the main cost drivers
Frequently asked questions
Do I need an agent to use Knowledge Bases?
No. You can call the KB directly via API (Retrieve or RetrieveAndGenerate) from any application. The agent is optional — it only adds automatic orchestration and the ReAct loop.
What happens when I update a document in S3?
The KB does not sync automatically by default. You need to trigger a sync operation (manual, scheduled via EventBridge, or via API). During sync, changed documents are re-ingested and old vectors are replaced.
Can I use multiple Knowledge Bases in the same agent?
Yes. Each KB appears as a separate tool to the agent. You can have one KB for technical documentation and another for internal policies, for example. The agent decides which to query based on the question context.
What is the difference between fixed and semantic chunking?
Fixed chunking splits text into blocks of N tokens with configurable overlap — simple and predictable. Semantic chunking uses a model to identify natural meaning boundaries, producing more coherent chunks but at higher processing cost. For most cases, fixed chunking with 20% overlap is a good starting point.
When to use managed — and when not to
Bedrock Knowledge Bases is the right choice for most projects already running on AWS that need RAG without building a pipeline from scratch. The gain in delivery speed is real. The flexibility cost is also real: if you need custom chunking, re-rankers, or complex hybrid retrieval logic, you'll hit the managed service ceiling sooner than you expect. My rule: start managed, measure, and only build your own when you have concrete evidence that managed doesn't work — not out of anticipation.
Quick check
1. What does a managed Bedrock Knowledge Base give you?