Embeddings and vector space: the basis of semantic search
How to turn meaning into numbers and why that unlocks search by meaning (and RAG).
5 min read
To a computer, text is just a sequence of characters — with no notion that 'dog' and 'canine' mean the same thing, or that 'bank account' and 'river bank' are entirely different worlds. Embeddings solve exactly that: they translate meaning into numeric coordinates, unlocking search by sense, recommendation, and — as we'll see in lesson 06 — RAG.
What an embedding actually is
An embedding is a vector — an ordered list of floating-point numbers — that represents a piece of text (or image, or audio) such that meaning is encoded in the position of that vector in a high-dimensional space.
Think of it like a map. Every city is a point. Geographically close cities sit near each other on the map. Now imagine a map where the axis is not latitude/longitude but meaning. 'King' and 'Queen' are close. 'King' and 'Apple' are far. 'Paris' is close to 'France' and also close to 'European capital'.
In practice, this space has hundreds or thousands of dimensions (common models use 768, 1536, or 3072 dimensions). You can't visualize it directly, but the geometry works: texts with similar meaning occupy neighboring regions; texts with opposite meaning occupy opposite regions.
The embedding model learns these coordinates during training — exactly the process we covered in lesson 02. It is trained on massive text corpora so that words and phrases appearing in similar contexts end up with similar vectors. The result is that position in the space carries real semantics, not just frequency statistics.
Vector space: proximity = similarity of meaning
Each text becomes a point in space. Distance between points reflects difference in meaning. Semantically close texts form natural clusters — even without sharing a single word.
- "cachorro" · [0.82, 0.14, ...]
- "cão" · [0.81, 0.15, ...]
- "gato" · [0.78, 0.21, ...]
- "banco financeiro" · [0.11, 0.88, ...]
- "investimento" · [0.09, 0.91, ...]
- "banco de praça" · [0.55, 0.12, ...]
- "árvore" · [0.58, 0.10, ...]
- Query do usuário · "animal de estimação"
- Resultados · cachorro, cão, gato
Cosine similarity: the ruler of vector space
When you want to know if two texts are semantically close, you compare their vectors. The most common metric is cosine similarity — but you don't need the formula to understand what it does.
Imagine two vectors as arrows pointing from the origin of a graph. If both arrows point in the same direction, the angle between them is zero: maximum similarity (value 1). If they point in perpendicular directions, the angle is 90°: no relation (value 0). If they point in opposite directions: negative similarity (value -1).
What makes cosine useful is that it measures direction, not length. A short text and a long text on the same topic may have vectors of different magnitudes, but they point in the same direction — and cosine captures that correctly.
In practice: you generate the embedding of the user's query, compute cosine similarity against all embeddings in your index, and return the closest ones. That is semantic search. It finds 'canine' when the user types 'dog', finds 'how to cancel my subscription' when the document says 'account cancellation procedure' — without a single word in common.
A common mistake is confusing the embedding model with the LLM that generates text. They are different models with different goals. The LLM predicts the next token (lesson 03). The embedding model transforms text into a vector — it generates nothing, it only encodes. On AWS, you'll use Amazon Titan Embeddings or Cohere Embed via Bedrock to generate vectors, and a model like Claude or Llama to generate responses. They work together in RAG, but they are separate pieces. Conflating the two causes confusion when choosing, scaling, and pricing.
Vector store: where embeddings live and how they are queried
Generating an embedding is half the work. The other half is storing and querying those vectors efficiently. That's where the vector store (or vector index) comes in.
A vector store is a database optimized for one specific operation: given a query vector, find the K closest vectors in the index — the operation called kNN (k-nearest neighbors) or, in the faster approximate version, ANN (approximate nearest neighbors).
Algorithms like HNSW (Hierarchical Navigable Small World) build navigation graphs that allow finding close neighbors in sub-linear time, without comparing against every vector in the index. That's what makes search viable at millions of documents.
Each entry in the vector store has three parts: the vector itself, an identifier, and a payload (the metadata — the original text, the source, the date, whatever you need). When the search returns the K nearest neighbors, you use the payload to retrieve the actual content and deliver it to the LLM.
On AWS, Amazon OpenSearch Service with the k-NN plugin and Amazon Aurora PostgreSQL with pgvector are the most common options. Bedrock Knowledge Bases abstracts all of this — but understanding what's underneath is what separates someone who configures from someone who architects. In lesson 06 (RAG) and lesson 18 (Knowledge Bases), you'll see this full flow in action.
Key takeaways from this lesson
Frequently asked questions
Do I need to train my own embedding model?
Almost never. Models like Titan Embeddings v2 or Cohere Embed Multilingual work very well for most cases, including Portuguese. Embedding fine-tuning makes sense in very specific domains (dense medical jargon, proprietary code) — and even then, start with the off-the-shelf model and measure before investing in training.
What is the ideal chunk size for generating embeddings?
It depends on the model and use case, but chunks of 256–512 tokens with ~20% overlap are a solid starting point. Very large chunks dilute the semantic signal; very small chunks lose context. This is a parameter you'll tune with evals — covered in lesson 09.
Does vector search replace keyword search (BM25)?
Not necessarily — they are complementary. Vector search is great for intent and paraphrase; BM25 is great for exact terms, proper names, and codes. Hybrid search (combining both with RRF or similar) usually beats either alone. OpenSearch and pgvector support this natively.
Quick check
1. What does an embedding represent?
2. Semantic search beats keyword search because…