Who is Fernando F. Azevedo?

Fernando F. Azevedo is a Senior Solutions Architect at Banco Itaú with 16+ years of experience across AWS, event-driven architecture, DevSecOps, Data Mesh, AI and financial systems.

What technical topics does Fernando work with?

Fernando works with AWS, Kubernetes, Kafka, Data Mesh, Amazon Bedrock, RAG, DevSecOps, observability, financial systems and architecture communication using C4, ADRs and trade-off analysis.

Is Fernando available for professional conversations?

Fernando is currently building at Banco Itaú and is open to thoughtful conversations about architecture, cloud, AI, engineering leadership, community, podcasts and technical collaboration.

Production RAG on AWS

Module 3 · Production on AWS· Lesson 10/12

Vector stores on AWS

Where to store embeddings: OpenSearch Serverless, Aurora pgvector and other options.

6 min read

You already have your embeddings — now you need somewhere to store, index, and search them in milliseconds. The choice of vector store defines the latency, cost, and operational complexity of your RAG. On AWS, the main options are OpenSearch Serverless, Aurora PostgreSQL with pgvector, Neptune Analytics, and MemoryDB — each with a different profile of cost, scale, and operations.

What is a vector store — and why it's not just a database

A vector store is a storage and search system optimized for high-dimensional vectors (typically 256–4096 dimensions). Traditional search compares exact values; vector search compares distances — cosine similarity, dot product, Euclidean distance. To do this at scale without scanning every vector, vector stores use ANN (Approximate Nearest Neighbor) indexes, with HNSW (Hierarchical Navigable Small World) being the most common: it organizes vectors in graph layers that allow logarithmic rather than linear navigation.

The practical point: HNSW trades a small degree of imprecision ("approximate") for viable speed and memory. In production, you rarely notice this imprecision — recall typically stays above 95% with well-configured parameters.

Beyond pure vector search, production requires metadata filters ("only documents from customer X", "only current version"), hybrid search (vector + BM25 — covered in lesson 05), and access control. Not every vector store delivers all three with the same maturity. Choosing the wrong one means rewriting the data layer when volume grows or when a customer asks for data isolation.

Vector stores on AWS: where each option fits

RAG pipeline flow showing the different vector store options available on AWS and their usage profiles.

🔵 Aplicação — RAG Pipeline

Aplicação RAG · Orchestrator
Embedding Model · Bedrock / Titan

🟧 AWS — Vector Stores

OpenSearch Serverless · HNSW + BM25 híbrido
Aurora pgvector · Postgres nativo
Neptune Analytics · Grafo + vetores
MemoryDB · In-memory / baixa latência

🤖 AWS — LLM

Bedrock LLM · Claude / Titan / etc.

The four options on AWS — real profiles

Amazon OpenSearch Serverless (AOSS) with vector engine is the most complete option for RAG on AWS today. It delivers HNSW vector search, BM25 full-text search, and the combination of both (hybrid search) in the same index. It scales automatically, without managing shards or instances. Cost is based on OCUs (OpenSearch Compute Units) — you pay for what you use, but the minimum of 2 OCUs per collection means small projects may pay more than they would with a provisioned instance. Integration with Bedrock Knowledge Bases (lesson 09) is native.

Aurora PostgreSQL with pgvector is the right choice when you already run Postgres and the vector volume is manageable (typically below a few million). pgvector supports HNSW and IVFFlat, full SQL filters, and ACID transactions. Cost is predictable — you pay for the Aurora instance, not per query. The downside: no native hybrid search (you implement BM25 with tsvector manually), and vector search performance at large scale lags behind OpenSearch.

Neptune Analytics makes sense when your data has graph structure — entities, relationships, hierarchies. RAG over knowledge graphs is a real use case, but niche. If you don't have a graph, don't force it.

MemoryDB for Redis with vector support delivers sub-5ms latencies, ideal for real-time RAG (live chat, voice assistants). The trade-off: memory cost is high, and the volume of vectors you can store is limited by cluster size.

When to use each vector store

OpenSearch Serverless

Pros

Native hybrid search (vector + BM25) with no extra code
Auto-scaling — no shard management
Native integration with Bedrock Knowledge Bases
Robust metadata filters

Cons

Minimum 2 OCUs per collection — expensive for small projects
Unpredictable cost during ingestion spikes
Not relational — no JOINs with transactional data

Default choice for production RAG with medium/high volume or hybrid search requirement

Aurora pgvector

Pros

No new infrastructure if you already use Postgres
Full SQL filters and JOINs with relational data
Predictable cost (fixed instance)
ACID — transactional consistency

Cons

Manual hybrid search via tsvector — more code
Vector performance degrades above ~5M vectors
No native integration with Bedrock Knowledge Bases

Ideal when you already run Postgres, volume is smaller, and operational simplicity matters

Neptune Analytics

Pros

Combines vector search with graph traversal
Ideal for RAG over knowledge graphs

Cons

Niche — only makes sense with graph data
Gremlin/openCypher learning curve
No native hybrid search

Use only if your data is already a knowledge graph

MemoryDB (Redis)

Pros

Sub-5ms latency — the fastest on the list
Good for real-time RAG (voice, live chat)

Cons

High memory cost per stored vector
Volume limited by cluster size
No native hybrid search

Use when latency is the dominant requirement and vector volume is small

In practice: my default recommendation

Senior Solutions Architect

In practice, for most production RAG projects on AWS, I start with OpenSearch Serverless. Native hybrid search and Bedrock Knowledge Bases integration save weeks of work. The minimum 2 OCU cost hurts in POCs — in those cases I use pgvector on an Aurora Serverless v2 with auto-pause, which costs pennies when idle. When the project grows and needs real hybrid search, I migrate to OpenSearch. I never choose MemoryDB as the primary vector store — the memory cost doesn't justify it except in very specific latency-critical cases.

Selection criteria: what really matters in production

Beyond technical profile, three criteria define the choice in production:

Scale and predictable growth. OpenSearch Serverless scales without intervention, but charges per OCU consumed. Aurora pgvector has a performance ceiling — if you project growing to tens of millions of vectors, planning a migration later is expensive. Dimension with headroom.

FinOps: serverless vs. provisioned. Serverless (AOSS) has variable cost — great when traffic is unpredictable, bad when you have constant high load. For constant high load, provisioned OpenSearch (not serverless) can be 40–60% cheaper. For low, intermittent load, Aurora Serverless v2 with auto-pause wins. Lesson 12 goes deeper on the full pipeline cost model.

Metadata filters and tenant isolation. If you have multiple customers in the same index, metadata filters are critical for security. OpenSearch and pgvector both support robust filters. But index design matters: filters on non-indexed fields are slow. Plan your filter fields before creating the index — changing them later requires full reindexing.

Operations and team expertise. If your team already runs Postgres, pgvector has zero adoption curve. If nobody knows OpenSearch, serverless significantly reduces operational overhead — you don't manage clusters, shards, or upgrades.

Match

Vector stores on AWS

Tap a concept, then its definition.

Quick comparison: vector stores on AWS

	Criterion	OpenSearch Serverless	Aurora pgvector	Neptune Analytics	MemoryDB
Native hybrid search	✅ Yes	⚠️ Manual (tsvector)	❌ No	❌ No	—
Scale (vectors)	High (billions)	Medium (~5M)	Medium	Low (RAM-bound)	—
Metadata filters	✅ Robust	✅ Full SQL	⚠️ Limited	⚠️ Basic	—
Bedrock KB integration	✅ Native	✅ Native	❌ No	❌ No	—
Cost model	OCU (variable)	Instance (fixed)	Instance (fixed)	Memory (high/GB)	—
Search latency	~10–50ms	~20–100ms	~20–80ms	< 5ms	—
Operational overhead	Low (serverless)	Low (familiar)	Medium	Medium	—

Key takeaways from this lesson

HNSW is the standard index for approximate vector search — trades minimal precision for viable speed at scale.

OpenSearch Serverless is the default choice for production RAG on AWS: native hybrid search, Bedrock KB integration, auto-scaling.

pgvector is the right choice when you already run Postgres and volume is smaller — zero new infrastructure, predictable cost.

The OpenSearch Serverless minimum of 2 OCUs hurts in POCs — use Aurora Serverless v2 with auto-pause in those cases.

Plan metadata filter fields before creating the index — changing them later requires full reindexing.

Neptune Analytics and MemoryDB are niche cases: knowledge graph and critical latency, respectively.

Frequently asked questions

Can I use provisioned OpenSearch (not serverless) as a vector store?

Yes. Provisioned OpenSearch Service supports the same HNSW vector engine and hybrid search. For constant high loads, it can be 40–60% cheaper than serverless. The trade-off is operational: you manage instances, shards, and upgrades. For most teams, serverless is worth the extra cost for simplicity.

Does Bedrock Knowledge Bases support pgvector as a vector store?

Yes, since 2024 Bedrock Knowledge Bases supports Aurora PostgreSQL with pgvector as a managed vector store option, in addition to OpenSearch Serverless. The integration is native — you point to the Aurora cluster and Bedrock manages ingestion and search.

What is the difference between HNSW and IVFFlat in pgvector?

HNSW has better recall and search performance, but uses more memory and has slower index build. IVFFlat has faster build and smaller memory footprint, but lower recall. For production RAG, prefer HNSW — the recall difference matters when you're searching for the top-k most relevant chunks.

How does data isolation between tenants work in the same OpenSearch index?

You add a metadata field (e.g., tenant_id) to each document and apply a mandatory filter on all queries. OpenSearch supports pre-query filters that are applied before vector search, ensuring one tenant never sees another's data. Important: this field needs to be mapped as keyword in the index for adequate performance.

My take

OpenSearch Serverless para produção; pgv

Vector store is not a commodity decision — it affects latency, cost, and what you can do with hybrid search. On AWS, OpenSearch Serverless is the mature production choice: it delivers hybrid search without extra code, scales without operations, and integrates natively with Bedrock. pgvector is the pragmatic choice when you already have Postgres and smaller volume. Don't overcomplicate it: choose the option your team can operate well, with the filters your use case requires, and dimension with headroom. Reindexing billions of vectors because you chose wrong is an expensive problem to have.

References

Amazon OpenSearch Serverless — Vector Engine pgvector — Open-source vector similarity search for Postgres Amazon Aurora PostgreSQL — pgvector support Bedrock Knowledge Bases — Supported vector stores Neptune Analytics — Vector search MemoryDB — Vector search HNSW: Efficient and robust approximate nearest neighbor search OpenSearch hybrid search — combining BM25 and vector

Previous Next lesson

Production RAG on AWS

Module 3 · Production on AWS· Lesson 10/12

Vector stores on AWS

Where to store embeddings: OpenSearch Serverless, Aurora pgvector and other options.

6 min read

What is a vector store — and why it's not just a database

Vector stores on AWS: where each option fits

RAG pipeline flow showing the different vector store options available on AWS and their usage profiles.

🔵 Aplicação — RAG Pipeline

Aplicação RAG · Orchestrator
Embedding Model · Bedrock / Titan

🟧 AWS — Vector Stores

OpenSearch Serverless · HNSW + BM25 híbrido
Aurora pgvector · Postgres nativo
Neptune Analytics · Grafo + vetores
MemoryDB · In-memory / baixa latência

🤖 AWS — LLM

Bedrock LLM · Claude / Titan / etc.

The four options on AWS — real profiles

When to use each vector store

OpenSearch Serverless

Pros

Native hybrid search (vector + BM25) with no extra code
Auto-scaling — no shard management
Native integration with Bedrock Knowledge Bases
Robust metadata filters

Cons

Minimum 2 OCUs per collection — expensive for small projects
Unpredictable cost during ingestion spikes
Not relational — no JOINs with transactional data

Default choice for production RAG with medium/high volume or hybrid search requirement

Aurora pgvector

Pros

No new infrastructure if you already use Postgres
Full SQL filters and JOINs with relational data
Predictable cost (fixed instance)
ACID — transactional consistency

Cons

Manual hybrid search via tsvector — more code
Vector performance degrades above ~5M vectors
No native integration with Bedrock Knowledge Bases

Ideal when you already run Postgres, volume is smaller, and operational simplicity matters

Neptune Analytics

Pros

Combines vector search with graph traversal
Ideal for RAG over knowledge graphs

Cons

Niche — only makes sense with graph data
Gremlin/openCypher learning curve
No native hybrid search

Use only if your data is already a knowledge graph

MemoryDB (Redis)

Pros

Sub-5ms latency — the fastest on the list
Good for real-time RAG (voice, live chat)

Cons

High memory cost per stored vector
Volume limited by cluster size
No native hybrid search

Use when latency is the dominant requirement and vector volume is small

In practice: my default recommendation

Senior Solutions Architect

Selection criteria: what really matters in production

Beyond technical profile, three criteria define the choice in production:

Match

Vector stores on AWS

Tap a concept, then its definition.

Quick comparison: vector stores on AWS

	Criterion	OpenSearch Serverless	Aurora pgvector	Neptune Analytics	MemoryDB
Native hybrid search	✅ Yes	⚠️ Manual (tsvector)	❌ No	❌ No	—
Scale (vectors)	High (billions)	Medium (~5M)	Medium	Low (RAM-bound)	—
Metadata filters	✅ Robust	✅ Full SQL	⚠️ Limited	⚠️ Basic	—
Bedrock KB integration	✅ Native	✅ Native	❌ No	❌ No	—
Cost model	OCU (variable)	Instance (fixed)	Instance (fixed)	Memory (high/GB)	—
Search latency	~10–50ms	~20–100ms	~20–80ms	< 5ms	—
Operational overhead	Low (serverless)	Low (familiar)	Medium	Medium	—

Key takeaways from this lesson

HNSW is the standard index for approximate vector search — trades minimal precision for viable speed at scale.

OpenSearch Serverless is the default choice for production RAG on AWS: native hybrid search, Bedrock KB integration, auto-scaling.

pgvector is the right choice when you already run Postgres and volume is smaller — zero new infrastructure, predictable cost.

The OpenSearch Serverless minimum of 2 OCUs hurts in POCs — use Aurora Serverless v2 with auto-pause in those cases.

Plan metadata filter fields before creating the index — changing them later requires full reindexing.

Neptune Analytics and MemoryDB are niche cases: knowledge graph and critical latency, respectively.

Frequently asked questions

Can I use provisioned OpenSearch (not serverless) as a vector store?

Does Bedrock Knowledge Bases support pgvector as a vector store?

What is the difference between HNSW and IVFFlat in pgvector?

How does data isolation between tenants work in the same OpenSearch index?

My take

OpenSearch Serverless para produção; pgv

References

Previous Next lesson