Vector stores on AWS
Where to store embeddings: OpenSearch Serverless, Aurora pgvector and other options.
6 min read
You already have your embeddings — now you need somewhere to store, index, and search them in milliseconds. The choice of vector store defines the latency, cost, and operational complexity of your RAG. On AWS, the main options are OpenSearch Serverless, Aurora PostgreSQL with pgvector, Neptune Analytics, and MemoryDB — each with a different profile of cost, scale, and operations.
What is a vector store — and why it's not just a database
A vector store is a storage and search system optimized for high-dimensional vectors (typically 256–4096 dimensions). Traditional search compares exact values; vector search compares distances — cosine similarity, dot product, Euclidean distance. To do this at scale without scanning every vector, vector stores use ANN (Approximate Nearest Neighbor) indexes, with HNSW (Hierarchical Navigable Small World) being the most common: it organizes vectors in graph layers that allow logarithmic rather than linear navigation.
The practical point: HNSW trades a small degree of imprecision ("approximate") for viable speed and memory. In production, you rarely notice this imprecision — recall typically stays above 95% with well-configured parameters.
Beyond pure vector search, production requires metadata filters ("only documents from customer X", "only current version"), hybrid search (vector + BM25 — covered in lesson 05), and access control. Not every vector store delivers all three with the same maturity. Choosing the wrong one means rewriting the data layer when volume grows or when a customer asks for data isolation.
Vector stores on AWS: where each option fits
RAG pipeline flow showing the different vector store options available on AWS and their usage profiles.
- Aplicação RAG · Orchestrator
- Embedding Model · Bedrock / Titan
- OpenSearch Serverless · HNSW + BM25 híbrido
- Aurora pgvector · Postgres nativo
- Neptune Analytics · Grafo + vetores
- MemoryDB · In-memory / baixa latência
- Bedrock LLM · Claude / Titan / etc.
The four options on AWS — real profiles
Amazon OpenSearch Serverless (AOSS) with vector engine is the most complete option for RAG on AWS today. It delivers HNSW vector search, BM25 full-text search, and the combination of both (hybrid search) in the same index. It scales automatically, without managing shards or instances. Cost is based on OCUs (OpenSearch Compute Units) — you pay for what you use, but the minimum of 2 OCUs per collection means small projects may pay more than they would with a provisioned instance. Integration with Bedrock Knowledge Bases (lesson 09) is native.
Aurora PostgreSQL with pgvector is the right choice when you already run Postgres and the vector volume is manageable (typically below a few million). pgvector supports HNSW and IVFFlat, full SQL filters, and ACID transactions. Cost is predictable — you pay for the Aurora instance, not per query. The downside: no native hybrid search (you implement BM25 with tsvector manually), and vector search performance at large scale lags behind OpenSearch.
Neptune Analytics makes sense when your data has graph structure — entities, relationships, hierarchies. RAG over knowledge graphs is a real use case, but niche. If you don't have a graph, don't force it.
MemoryDB for Redis with vector support delivers sub-5ms latencies, ideal for real-time RAG (live chat, voice assistants). The trade-off: memory cost is high, and the volume of vectors you can store is limited by cluster size.
When to use each vector store
OpenSearch Serverless
- Native hybrid search (vector + BM25) with no extra code
- Auto-scaling — no shard management
- Native integration with Bedrock Knowledge Bases
- Robust metadata filters
- Minimum 2 OCUs per collection — expensive for small projects
- Unpredictable cost during ingestion spikes
- Not relational — no JOINs with transactional data
Default choice for production RAG with medium/high volume or hybrid search requirement
Aurora pgvector
- No new infrastructure if you already use Postgres
- Full SQL filters and JOINs with relational data
- Predictable cost (fixed instance)
- ACID — transactional consistency
- Manual hybrid search via tsvector — more code
- Vector performance degrades above ~5M vectors
- No native integration with Bedrock Knowledge Bases
Ideal when you already run Postgres, volume is smaller, and operational simplicity matters
Neptune Analytics
- Combines vector search with graph traversal
- Ideal for RAG over knowledge graphs
- Niche — only makes sense with graph data
- Gremlin/openCypher learning curve
- No native hybrid search
Use only if your data is already a knowledge graph
MemoryDB (Redis)
- Sub-5ms latency — the fastest on the list
- Good for real-time RAG (voice, live chat)
- High memory cost per stored vector
- Volume limited by cluster size
- No native hybrid search
Use when latency is the dominant requirement and vector volume is small
In practice, for most production RAG projects on AWS, I start with OpenSearch Serverless. Native hybrid search and Bedrock Knowledge Bases integration save weeks of work. The minimum 2 OCU cost hurts in POCs — in those cases I use pgvector on an Aurora Serverless v2 with auto-pause, which costs pennies when idle. When the project grows and needs real hybrid search, I migrate to OpenSearch. I never choose MemoryDB as the primary vector store — the memory cost doesn't justify it except in very specific latency-critical cases.
Selection criteria: what really matters in production
Beyond technical profile, three criteria define the choice in production:
Scale and predictable growth. OpenSearch Serverless scales without intervention, but charges per OCU consumed. Aurora pgvector has a performance ceiling — if you project growing to tens of millions of vectors, planning a migration later is expensive. Dimension with headroom.
FinOps: serverless vs. provisioned. Serverless (AOSS) has variable cost — great when traffic is unpredictable, bad when you have constant high load. For constant high load, provisioned OpenSearch (not serverless) can be 40–60% cheaper. For low, intermittent load, Aurora Serverless v2 with auto-pause wins. Lesson 12 goes deeper on the full pipeline cost model.
Metadata filters and tenant isolation. If you have multiple customers in the same index, metadata filters are critical for security. OpenSearch and pgvector both support robust filters. But index design matters: filters on non-indexed fields are slow. Plan your filter fields before creating the index — changing them later requires full reindexing.
Operations and team expertise. If your team already runs Postgres, pgvector has zero adoption curve. If nobody knows OpenSearch, serverless significantly reduces operational overhead — you don't manage clusters, shards, or upgrades.
Vector stores on AWS
Tap a concept, then its definition.
Quick comparison: vector stores on AWS
| Criterion | OpenSearch Serverless | Aurora pgvector | Neptune Analytics | MemoryDB | |
|---|---|---|---|---|---|
| Native hybrid search | ✅ Yes | ⚠️ Manual (tsvector) | ❌ No | ❌ No | — |
| Scale (vectors) | High (billions) | Medium (~5M) | Medium | Low (RAM-bound) | — |
| Metadata filters | ✅ Robust | ✅ Full SQL | ⚠️ Limited | ⚠️ Basic | — |
| Bedrock KB integration | ✅ Native | ✅ Native | ❌ No | ❌ No | — |
| Cost model | OCU (variable) | Instance (fixed) | Instance (fixed) | Memory (high/GB) | — |
| Search latency | ~10–50ms | ~20–100ms | ~20–80ms | < 5ms | — |
| Operational overhead | Low (serverless) | Low (familiar) | Medium | Medium | — |
Key takeaways from this lesson
Frequently asked questions
Can I use provisioned OpenSearch (not serverless) as a vector store?
Yes. Provisioned OpenSearch Service supports the same HNSW vector engine and hybrid search. For constant high loads, it can be 40–60% cheaper than serverless. The trade-off is operational: you manage instances, shards, and upgrades. For most teams, serverless is worth the extra cost for simplicity.
Does Bedrock Knowledge Bases support pgvector as a vector store?
Yes, since 2024 Bedrock Knowledge Bases supports Aurora PostgreSQL with pgvector as a managed vector store option, in addition to OpenSearch Serverless. The integration is native — you point to the Aurora cluster and Bedrock manages ingestion and search.
What is the difference between HNSW and IVFFlat in pgvector?
HNSW has better recall and search performance, but uses more memory and has slower index build. IVFFlat has faster build and smaller memory footprint, but lower recall. For production RAG, prefer HNSW — the recall difference matters when you're searching for the top-k most relevant chunks.
How does data isolation between tenants work in the same OpenSearch index?
You add a metadata field (e.g., tenant_id) to each document and apply a mandatory filter on all queries. OpenSearch supports pre-query filters that are applied before vector search, ensuring one tenant never sees another's data. Important: this field needs to be mapped as keyword in the index for adequate performance.
My take
Vector store is not a commodity decision — it affects latency, cost, and what you can do with hybrid search. On AWS, OpenSearch Serverless is the mature production choice: it delivers hybrid search without extra code, scales without operations, and integrates natively with Bedrock. pgvector is the pragmatic choice when you already have Postgres and smaller volume. Don't overcomplicate it: choose the option your team can operate well, with the filters your use case requires, and dimension with headroom. Reindexing billions of vectors because you chose wrong is an expensive problem to have.