Playbook: Vector Store on AWS — OpenSearch Serverless vs Aurora pgvector vs S3 Vectors
Listen to study
generated on playGenerated only on first play
Powered by Amazon Polly + OmniVoice
Picking the wrong vector store is where your RAG bill explodes — or where latency delivers unacceptable p99 in production. This playbook maps the three main AWS paths (OpenSearch Serverless, Aurora pgvector, S3 Vectors) across three real axes: required latency, cost model, and operational burden. Leave with a defensible decision, not the tutorial default.
Every RAG tutorial points to a vector store. No tutorial tells you what happens when traffic disappears at 3 AM and you're still paying for idle OCUs, or when your HNSW index grows without tuning and latency silently doubles. The vector store choice is a cost, latency, and operations decision — not a feature list comparison. This playbook gives you the three decision axes and a clear path for each workload profile.
What you'll be able to decide after this playbook
Quick Reference — the three stores in numbers
- OpenSearch Serverless — minimum cost (OCU)
- 0.24 USD/OCU-hour; minimum 2 indexing OCUs + 2 search OCUs = ~$345/month at rest (us-east-1, estimate)
- Aurora pgvector — supported dimensions
- Up to 16,000 dimensions per vector (pgvector ≥ 0.7); HNSW and IVFFlat indexes available
- S3 Vectors — pricing model
- Charged per GB stored + per query request; no idle compute cost (GA 2025)
- Native integration with Bedrock Knowledge Bases
- OpenSearch Serverless: yes (native). Aurora pgvector: yes (via RDS Data API). S3 Vectors: yes (GA 2025)
- Hybrid search (vector + lexical)
- OpenSearch Serverless: yes (BM25 + kNN native). Aurora pgvector: manual (pg_trgm + pgvector). S3 Vectors: not native
- Operational model
- OpenSearch Serverless: fully managed. Aurora pgvector: managed (RDS), but index tuning is yours. S3 Vectors: fully managed
The mental model that unlocks the decision
Think of vector stores the way you think about databases: there is no best — there is the right one for the access profile. The classic mistake is treating the choice as a features decision ("which supports more dimensions?") when in practice it is a decision across three orthogonal axes:
Axis 1 — Required latency (p99): What is your application's SLA? An interactive chatbot tolerates ~200ms p99 on vector search. A batch reranking pipeline can tolerate seconds. An autonomous agent making multiple searches per turn needs consistent latency, not just a low median. p99 is the number that matters, not p50.
Axis 2 — Cost model (idle vs volume): OpenSearch Serverless charges per OCU-hour regardless of load — you pay the floor, not usage. Aurora pgvector charges for the RDS instance (which you probably already have). S3 Vectors charges per storage and per query — near-zero cost at rest, grows with query volume. If your load is highly variable or seasonal, the cost model dominates the decision.
Axis 3 — Operational burden (managed vs you tune): OpenSearch Serverless abstracts sharding, replication, and scaling. S3 Vectors is object + query, no visible infrastructure. Aurora pgvector gives you full control — and full responsibility: you need to choose between HNSW and IVFFlat, set m and ef_construction, monitor index bloat, and understand that a poorly timed VACUUM can degrade latency in production.
Most architecture mistakes I see come from ignoring Axis 2 (idle cost) or Axis 3 (index tuning). Tutorials use OpenSearch Serverless because it's the lowest-friction path for a demo. Production is different.
Head-to-head comparison — the three AWS vector stores
| Criterion | OpenSearch Serverless | Aurora pgvector | S3 Vectors | |
|---|---|---|---|---|
| Best for | — | Agentic RAG, hybrid search, rich filters, variable load with ms SLA | You already have Postgres; vectors alongside relational data; joins with business metadata | Huge corpus (billions of vectors), sporadic access, storage cost dominant |
| Typical p99 latency (ANN search) | — | 10–80ms (warmed load, adequate OCUs) | 5–150ms (depends on index, instance, and vacuum) | 50–500ms+ (archive profile; cold start possible) |
| Cost model | — | OCU-hour (minimum ~$345/month idle); auto-scales up | RDS instance + storage; you already pay if the DB exists | GB stored + per query; near-zero idle cost |
| Hybrid search (vector + lexical) | — | Native (BM25 + kNN, normalization pipeline) | Manual (pg_trgm + pgvector; you write the SQL) | Not natively available |
| Metadata filters | — | Rich (pre-filter and post-filter, nested fields) | Full via SQL (native WHERE clause) | Supported by bucket/prefix; advanced filters limited |
| Operational burden | — | Low (fully managed; no exposed index tuning) | Medium-high (HNSW/IVFFlat tuning, vacuum, bloat, instance sizing) | Low (fully managed; object + query model) |
| Bedrock Knowledge Bases integration | — | Native and mature (default option in tutorials) | Yes (via RDS Data API) | Yes (GA 2025) |
| When NOT to use | — | Low/sporadic load where minimum cost isn't justified; tight budget in dev/staging | No existing Postgres; team with no index tuning experience; volume > 100M vectors without sharding | ms SLA for interactive production; hybrid search required; complex metadata filters |
OpenSearch Serverless: the power and the trap of the cost floor
OpenSearch Serverless with the vector engine is genuinely the most capable store for production RAG with complex requirements. Native hybrid search (BM25 + kNN with normalization pipeline), pre and post filters, direct Bedrock Knowledge Bases integration, and automatic OCU scaling are real differentiators — not marketing.
The problem is the cost model. Each vector search collection requires a minimum of 2 indexing OCUs and 2 search OCUs. At $0.24/OCU-hour, that's approximately $345/month at rest, without a single query. In development, staging, or applications with low and variable load, that floor is unacceptable. I've seen teams running three environments (dev, staging, prod) with OpenSearch Serverless and paying ~$1,000/month before processing a single real vector.
Automatic scaling is real but asymmetric: it scales up fast when load increases, scales down slowly. Load spikes can provision additional OCUs that take hours to scale back down. Monitor SearchOCUs and IndexingOCUs in CloudWatch and configure billing alerts before going to production.
For the right profile — agentic RAG with multiple searches per turn, hybrid search, rich filters, reasonably constant load above the floor — it's the correct choice and the cost is justified. For everything else, evaluate the other two paths first.
Aurora pgvector: the power of Postgres — and the tuning nobody does
If you already operate Aurora PostgreSQL, adding pgvector is the lowest-friction decision possible: one extension, one column type, one index. You keep ACID transactions, joins with relational data, row-level security access control, and the entire toolchain your team already knows. For cases where vectors need to live alongside relational metadata — versioned documents, per-user permissions, joins with business tables — this co-location eliminates an entire class of consistency problems.
What most teams ignore is index tuning. pgvector supports two types:
- IVFFlat: divides the vector space into lists (
lists), searches the closest ones (probes). Faster to build, less precise. Requires the index to be built after data is loaded (otherwise lists are unbalanced). Critical parameter:ivfflat.probes— increasing it improves recall but hurts latency.
- HNSW: hierarchical navigable small world graph. Better recall and latency, but consumes more memory and takes longer to build. Critical parameters:
m(connections per node, default 16) andef_construction(candidate queue size during build, default 64). For production with ms SLA, HNSW is the right choice — but you need to sizeshared_buffersandwork_memso the index fits in memory.
The silent problem is index bloat: frequent inserts and updates fragment the HNSW index. Periodic REINDEX CONCURRENTLY is necessary under continuous write loads. And a poorly configured AUTOVACUUM can block search queries at critical moments.
Practical rule: if you don't have someone on the team who knows what ef_search is and when to adjust it, think twice before choosing pgvector for a p99 < 100ms SLA in production.
Decision Matrix — which vector store for your case
OpenSearch Serverless (Vector Engine)
- Native hybrid search (BM25 + kNN) without extra code
- Rich metadata filters (pre/post-filter) native
- Automatic OCU scaling; no exposed index tuning
- Native and mature integration with Bedrock Knowledge Bases
- Ideal for agentic RAG with multiple searches per turn
- Minimum cost ~$345/month even without queries (2+2 OCUs)
- Asymmetric scale-down: scales up fast, down slowly
- No access to underlying index; limited recall debugging
- Prohibitive cost for dev/staging if not shared
USE when: production RAG with hybrid search, complex filters, constant load above cost floor. DON'T USE when: sporadic load, tight budget, or you only need simple similarity.
Aurora PostgreSQL + pgvector
- Zero additional cost if Aurora already exists in the stack
- Native joins with relational data (metadata, permissions, versions)
- ACID, row-level security, familiar Postgres toolchain
- Full index control (HNSW vs IVFFlat, parameters)
- Supports up to 16,000 dimensions (pgvector ≥ 0.7)
- Index tuning is your responsibility (HNSW params, vacuum, bloat)
- No native hybrid search; pg_trgm is a workaround, not a solution
- Vector scalability limited by RDS instance (no automatic sharding)
- Index bloat under continuous write loads requires active maintenance
USE when: Aurora already exists, vectors need relational joins, team has Postgres tuning experience. DON'T USE when: you need real hybrid search, volume > 50-100M vectors, or team lacks index expertise.
S3 Vectors
- Near-zero idle cost (pays per GB + per query, not per compute)
- Scales to billions of vectors without infrastructure management
- Fully managed; simple object + query model
- Integration with Bedrock Knowledge Bases (GA 2025)
- Ideal for batch pipelines, archive corpora, historical embeddings
- Higher p99 latency; not suitable for ms SLA in interactive production
- No native hybrid search
- Advanced metadata filters limited compared to OpenSearch
- Newer product (GA 2025); tooling ecosystem still maturing
USE when: huge volume, sporadic access, storage cost is the driver, batch pipeline or offline RAG. DON'T USE when: ms latency is a requirement, hybrid search is needed, or the application is real-time interactive.
How to decide: 5 questions in order
- 1
Step 1: What is your required p99 latency?
If p99 < 200ms in interactive production → eliminate S3 Vectors. If p99 can be seconds (batch, offline pipeline) → S3 Vectors is a strong candidate. Test: define the SLA before choosing the store, not after.
- 2
Step 2: Do you already have Aurora PostgreSQL in production?
If yes → evaluate pgvector first. Calculate existing instance cost vs OpenSearch Serverless minimum cost. If vectors need joins with relational data → pgvector is strongly favored. If no Postgres → pgvector is not the lowest-friction path.
- 3
Step 3: Do you need hybrid search (vector + lexical)?
If yes → OpenSearch Serverless is the only one of the three with real native support. pgvector + pg_trgm works but is a workaround you'll maintain. S3 Vectors doesn't support it. Validate: test recall with real queries before assuming pure vector search is sufficient.
- 4
Step 4: What is the load profile (constant vs sporadic)?
Calculate: (hours/month with real load) × (OCU-hour OpenSearch) vs monthly minimum cost. If load is < 50% of the time → OpenSearch Serverless idle cost probably isn't justified. Highly seasonal load → S3 Vectors or pgvector (if it already exists) are more cost-efficient.
- 5
Step 5: Does your team have index tuning capability?
If choosing pgvector: define who owns HNSW params, vacuum schedule, and bloat monitoring before deploy. If nobody on the team knows what
ef_constructionis → either invest in training or choose OpenSearch Serverless. No middle ground: a poorly tuned pgvector index in production is a latency time bomb.
Decision tree — vector store on AWS
Decision flow across three axes: p99 latency, existing Postgres, and load/cost profile. Each decision node leads to a recommended store or an additional qualification.
- Preciso de · um vector store · para RAG na AWS
- p99 < 200ms · em produção · interativa?
- S3 Vectors · ✓ Batch / Arquivo · ✓ Custo mínimo · ✓ Bilhões de vetores
- Aurora PostgreSQL · já existe · no stack?
- Aurora pgvector · ✓ Joins relacionais · ✓ Custo zero adicional · ⚠️ Tuning necessário
- Busca híbrida · ou filtros · ricos?
- Carga constante · > 50% do tempo · ou SLA crítico?
- OpenSearch Serverless · ✓ Híbrido nativo · ✓ Filtros ricos · ✓ RAG agêntico · ⚠️ ~$345/mês mínimo
- Aurora pgvector · ✓ Já tem Postgres · ✓ Custo eficiente · ⚠️ Tuning obrigatório
- OpenSearch Serverless · (carga justifica custo)
- S3 Vectors · (custo ocioso inaceitável)
Anti-patterns that surface in production
1. Choosing by tutorial, not by load profile. OpenSearch Serverless is the default in all Bedrock Knowledge Bases examples. That doesn't mean it's the right choice for you. If your load is sporadic or you're in dev/staging, the minimum cost will appear on your bill every month without delivering proportional value.
2. Ignoring OpenSearch Serverless idle cost. Automatic scaling goes up fast and comes down slowly. A load spike at 2 PM can keep OCUs provisioned until 5 PM. Without billing alerts and without monitoring SearchOCUs, you discover the problem at the end of the month. Configure aws cloudwatch put-metric-alarm for OCU count before go-live.
3. Deploying pgvector without index tuning. The default pgvector behavior without an index is exact search (sequential scan) — works perfectly in development with 10,000 vectors, explodes in production with 10 million. Without explicit CREATE INDEX USING hnsw and without configuring ef_search at runtime, you'll get second-level latency where you expected milliseconds. And the worst part: it will work in tests and only fail under real load.
Rule of thumb
If you're paying for compute, demand the p99. If you're paying for storage, accept the latency. OpenSearch Serverless charges for compute (OCU-hour) — demand ms latency in return. S3 Vectors charges for storage and query — accept higher latency as the cost trade-off. Aurora pgvector charges for the instance you already have — incremental cost is low, but operational cost (tuning) is high. Map what you're paying for and what you're getting in return.
In most projects I architect, the vector store decision is determined before any benchmark: I look at the existing stack first. If there's Aurora PostgreSQL with reasonable load, I start with pgvector — the extension is available, incremental cost is near zero, and the team already knows how to operate Postgres. I document the chosen HNSW parameters, create an index maintenance runbook, and monitor pg_stat_user_indexes for bloat. This solves 60% of cases.
For the other 40% — when there's no Postgres, when hybrid search is a real (not aspirational) requirement, or when the RAG is agentic with multiple searches per turn — I use OpenSearch Serverless. But never without first calculating the monthly minimum cost and presenting it to the client/stakeholder as a fixed infrastructure cost, not a variable cost.
S3 Vectors I reserve for large-scale embedding pipelines (historical corpus ingestion, legacy document embeddings) where access is sporadic and volume is large. It's a new product and I wouldn't yet put it as the primary store for an interactive production RAG without extensive latency benchmarks for the specific case.
What I never do: choose the store by tutorial without going through the five decision steps. The RAG bill explodes exactly there.
Verdict
There is no right vector store — there is the right one for your profile. OpenSearch Serverless is the most capable for complex RAG, but you pay the floor every month regardless of usage. Aurora pgvector is the most cost-efficient choice if Postgres already exists, but index tuning is your problem. S3 Vectors is the right choice for volume and rest cost, but not for ms p99. Decide by the three axes — latency, cost, operations — not by the tutorial's feature list. The RAG bill explodes when you choose by the lowest-friction demo path and discover the real cost model in production.
Post-mortems, ADRs and architecture deep dives in your inbox — the way an architect reads them.
No spam · unsubscribe anytime
Ask Fernando about this
Get a focused answer about this study from my AI assistant, grounded in my work.
Join the conversation
Sign in to comment
Verify your email to join in — you'll also get the newsletter. No password.