Chunking: strategies and pitfalls
How to split documents without destroying context — the decision that most affects RAG quality.
5 min read
The model doesn't see your document — it sees the chunk you cut out of it. If that chunk is poorly split, context-free, sliced through a table, or too large to be useful, the answer will be bad regardless of which LLM you use. Chunking is the architectural decision that most affects RAG quality, and it's the one most projects get wrong first.
Why chunking matters so much
Think of the RAG pipeline as a two-step funnel: first you retrieve the most relevant chunks, then the model generates the answer from them. The model can only reason about what's in the context window — and what's in the context window is exactly the chunks the retriever selected.
This creates a direct dependency: bad chunk → bad embedding → bad retrieval → bad answer. You can have the best model in the world and a perfectly tuned vector search, but if the retrieved chunk cut the sentence before its conclusion, or mixed two different topics together, the model will hallucinate or answer incompletely.
Beyond quality, chunk size directly affects cost. Large chunks increase the number of tokens sent to the model on every call. If you have 5 chunks of 1,000 tokens each in context, that's already 5,000 tokens just for context — before counting the prompt and the response. In production, with thousands of calls per day, that adds up fast.
The good news: chunking is a decision you can iterate on. There is no universally correct strategy — there is the right strategy for your document type and your use case.
Chunking strategies: from document to chunks
The same document can be split in very different ways. Each strategy produces chunks with distinct characteristics of size, semantic coherence, and structure preservation.
- Documento · PDF / MD / HTML
- Tamanho fixo · 512 tokens, overlap 50
- Por sentença · ou parágrafo
- Recursivo · · · → · → espaço
- Por estrutura · headings / tabelas
- Semântico · grupo por similaridade
- Chunk · texto + título + seção + fonte
- Embedding · model
- Vector store · OpenSearch / Pinecone
The five main strategies — and when to use each one
Fixed size is almost everyone's starting point: you set a token limit (e.g., 512) and split the document at that interval, with a configurable overlap. It's simple, predictable, and easy to debug. The problem is that it ignores text structure — it can cut a sentence in half, separate a question from its answer, or mix the end of one topic with the start of the next.
By sentence or paragraph respects the natural breaks in text. It's better than fixed size for flowing prose, but produces chunks of very variable size — a paragraph can have 30 words or 300.
Recursive is the most widely used in practice for general text. You define a hierarchy of separators (\n\n, \n, ., ) and the algorithm tries to keep the chunk within the limit using the highest-level separator possible. This is what LangChain's RecursiveCharacterTextSplitter implements, and it works well for most documents.
By structure is the right choice when the document has explicit hierarchy: Markdown headings, HTML sections, slides. You keep each section as a minimum unit and inherit the title as metadata automatically. For technical documents with tables and code blocks, this strategy avoids destructive cuts.
Semantic groups sentences by embedding similarity — semantically cohesive chunks, but with higher processing cost at indexing time. It makes sense for large, heterogeneous corpora where structure is unreliable.
Chunking strategy comparison
| Strategy | Semantic coherence | Predictable size | Implementation cost | Best for | |
|---|---|---|---|---|---|
| Fixed size | Low | High | Minimal | Fast prototyping | — |
| Sentence/paragraph | Medium | Low | Low | Flowing prose, articles | — |
| Recursive | Medium-high | Medium | Low | General text, documentation | — |
| Structural | High | Variable | Medium | Technical docs, wikis, HTML | — |
| Semantic | High | Low | High | Large, heterogeneous corpora | — |
Quick check
1. Why is chunking so decisive in RAG?
Overlap, metadata, and the pitfalls that destroy quality
Overlap is the token overlap between consecutive chunks. If you use 512-token chunks with 50-token overlap, the last 50 tokens of chunk N also appear at the start of chunk N+1. This seems wasteful, but it has a clear purpose: preventing important information from falling in the "seam" between two chunks and being retrieved by neither. For most cases, overlap between 10% and 15% of chunk size is enough. More than that and you start duplicating content in the context.
Metadata in the chunk is as important as the text itself. Each chunk should carry: document title, source section or heading, URL or file path, and ideally the creation or update date. This metadata serves two purposes: filtering in search (we'll cover this in lesson 06) and building citations in the response (lesson 11). If you don't preserve provenance at chunking time, it will be impossible to reconstruct it later.
The most common pitfalls: chunks that are too large (above 1,000 tokens) inject noise into the context — the model receives irrelevant information alongside the relevant and may get confused. Chunks that are too small (below 100 tokens) lose context — an isolated sentence rarely carries enough meaning for the model to answer well. And the worst case: cutting a table or code block in half. Tables have a header + rows — separated, both become unintelligible. Code has dependencies between lines. Always treat tables and code as atomic units.
In practice, I almost always start with recursive chunking at 512 tokens and 10% overlap. It's the safest starting point for general text — it works reasonably well before any optimization. Then I look at the actual documents: if they have clear headings, I switch to structural chunking and inherit the heading as metadata. If they have tables or code, I isolate those sections before any split. I only invest in semantic chunking when the corpus is large, heterogeneous, and recursive results have already hit a ceiling. Order matters: don't optimize chunking before you have a baseline and an evaluation metric — otherwise you're tuning in the dark.
Key takeaways from this lesson
Frequently asked questions about chunking
What is the ideal chunk size?
There is no universal number. For technical documentation, 400–600 tokens with 50–80 token overlap is a solid starting point. For more narrative texts, whole paragraphs often work better than a fixed token limit. The right size is whatever maximizes your evaluation metric — which is why lesson 08 (evaluation) is the mandatory companion to this one.
Do I need to re-index everything if I change the chunking strategy?
Yes. Chunking and embedding are inseparable — the vector represents the text of that specific chunk. If you change how you split the text, the old vectors become inconsistent with the new ones. A full re-index is required. That's why it's worth having an automated indexing pipeline from the start.
Does Amazon Bedrock Knowledge Bases do chunking automatically?
Yes — Knowledge Bases offers fixed, sentence-based, and semantic chunking as configurable options. It's convenient, but you give up fine-grained control over overlap, table handling, and custom metadata. We'll detail this in lesson 09.
Chunking strategies
Tap a concept, then its definition.