Secure Multi-Tenant RAG: ADR for Two-Layer Authorization
Listen to article
Fernando's voiceFernando · 18:00
Powered by Amazon Polly + OmniVoice
Authorization in multi-tenant RAG is not an implementation detail — it is an architectural decision with direct consequences for compliance, cross-tenant data leakage, and LLM attack surface. In this ADR, I document the forces that led me to adopt a defense-in-depth pattern with Amazon Verified Permissions and Bedrock Knowledge Bases, the alternatives I discarded, and the operational consequences that must be managed.
When a RAG system serves multiple tenants in a regulated financial environment, the question is not 'how do I integrate the model?' — it is 'how do I guarantee that client A's document never surfaces in client B's response, even under partial misconfiguration?' This ADR records the architectural decision I made to solve exactly that problem.
Context and Forces: Why Authorization in RAG Is Different
In conventional single-tenant RAG systems, access control is solved at the application layer: you authenticate the user, filter documents by ownership, and pass context to the model. The problem begins when you scale this to dozens of tenants with distinct SLAs, heterogeneous permission schemas, and regulatory obligations such as LGPD, SOC 2, and PCI-DSS.
The central force here is context contamination risk. In a RAG architecture, retrieval determines what the model sees. If the vector search mechanism returns document chunks from another tenant — due to metadata error, filter failure, or prompt injection — the model processes and potentially reproduces confidential information. This is not hypothetical: in penetration tests I have conducted on enterprise RAG platforms, I was able to extract context from other tenants by manipulating the embedding query before any application-layer filter was applied.
The second force is intra-tenant permission heterogeneity. Within a single financial tenant, a junior analyst should not see the same documents as a portfolio manager. This requires attribute-based access control (ABAC), not just role-based. Most RAG implementations I see in production ignore this dimension and treat the tenant as the atomic authorization unit — which is insufficient for any regulated environment.
The third force is the model's own attack surface. LLMs are susceptible to prompt injection. If a malicious document in the corpus instructs the model to ignore restrictions or reveal system context, authorization that lives only in the prompt is not sufficient. The defense must exist before the model sees any token.
Additional Forces: Operability, Audit, and Latency
Beyond security forces, there are three operational forces that shaped this decision. The first is regulatory auditability. In financial environments, every access decision must be recorded with enough context to respond to an audit: who accessed, which document, under which policy justification, at which timestamp. This rules out approaches where authorization is implicit in application code without a structured trail.
The second force is inference latency. A RAG cycle already carries embedding latency (typically 50-150ms on amazon.titan-embed-text-v2), vector search (20-80ms on OpenSearch Serverless with k-NN), and model inference (500ms-3s depending on the model and context size). Adding an authorization layer that introduces more than 30-50ms of overhead is unacceptable for product SLAs. This rules out solutions requiring synchronous calls to slow external systems on the critical path.
The third force is policy management at scale. With dozens of tenants and hundreds of distinct roles, maintaining authorization policies as application code becomes unmanageable. You need a centralized, versioned, and auditable policy store with native ABAC support — not a DynamoDB table with authorization logic scattered across Lambda functions.
These three operational forces, combined with the security forces, eliminated the simpler approaches and forced me to consider a defense-in-depth pattern with specialized components for each layer.
Options Considered for Multi-Tenant RAG Authorization
Option 1: Metadata Filter Only (Application Layer)
- Simple implementation, no additional dependencies
- Minimal latency — filter applied in query payload
- No additional service cost
- Single point of failure: code bug exposes cross-tenant data
- No native intra-tenant ABAC support
- No structured audit trail of access decisions
- Policies coupled to code — hard to audit and version
Discarded — insufficient for regulated environments
Option 2: Separate Knowledge Bases per Tenant
- Full physical isolation between tenants
- No context contamination risk at retrieval
- IAM resource-based policies per Knowledge Base
- High operational cost: each KB has fixed OpenSearch Serverless cost (~$700/month minimum per collection)
- Does not solve intra-tenant control (roles within the same tenant)
- Managing dozens of KBs becomes operationally complex
- Default limit of 5 Knowledge Bases per account (Service Quota)
Partially viable only for a very small number of high-value tenants
Option 3: Defense in Depth — Verified Permissions + Metadata Filter
- Two independent layers: failure in one does not compromise the other
- AVP provides native ABAC with Cedar policies — auditable and versionable
- AVP IsAuthorized latency: ~10-20ms p99 (regional API, no cold start)
- Native audit trail via CloudTrail for every authorization decision
- Metadata filter on Knowledge Base as second enforcement layer
- Cedar policy modeling complexity for ABAC — real learning curve
- Additional AVP cost: $0.10 per 1000 authorization requests
- Requires entity synchronization (users, roles) between IdP and AVP
Adopted decision — best balance of security, auditability, and operational cost
Option 4: OPA (Open Policy Agent) Self-Managed
- Maximum policy flexibility with Rego
- No vendor lock-in to managed service
- Integration with CNCF ecosystem (OPAL for sync)
- Operating OPA cluster on EKS adds significant SRE overhead
- No native audit trail integrated with CloudTrail
- Variable latency depending on bundle size and Rego complexity
- HA, scaling, and patching responsibility falls on the team
Discarded for this context — unjustifiable operational overhead when AVP solves the problem
The Decision: Defense in Depth with Independent Layers
I adopted the defense-in-depth pattern with two independent authorization layers. Layer 1 is Amazon Verified Permissions (AVP) with Cedar policies, operating before retrieval. Layer 2 is the native metadata filter of Bedrock Knowledge Base, operating at vector search time. The independence between layers is the critical point: a Cedar misconfiguration does not disable the metadata filter, and vice versa.
The choice of Cedar as the policy language is deliberate. Cedar is a policy language with formal semantics and a deterministic evaluation model — unlike Rego, which allows constructs that can produce unexpected results in edge cases. For financial environments, the predictability of the evaluation engine is as important as the expressiveness of the language. Additionally, AVP offers static policy analysis via IsAuthorizedWithToken, allowing policies to be validated before deployment — something OPA does not offer natively.
Entity modeling in AVP follows a hierarchical schema: Tenant > Group > User, with document attributes such as classification_level, tenant_id, and document_type. A typical policy for a financial analyst would be: permit(principal in Group::"analysts", action == Action::"ReadDocument", resource) when { resource.classification_level <= principal.clearance_level && resource.tenant_id == principal.tenant_id };. This policy evaluates in ~12ms p50 and ~18ms p99 in my measurement with medium-sized entity payloads.
The metadata filter on the Knowledge Base is configured as { "equals": { "key": "tenant_id", "value": "<tenant-from-jwt>" } } combined with { "equals": { "key": "clearance_level_max", "value": "<user-clearance>" } }. These metadata fields are indexed in OpenSearch Serverless and the filter is applied before similarity ranking — ensuring that chunks from other tenants never enter the model's context.
Two-Layer Authorization Flow for Multi-Tenant RAG
Every RAG request passes through two independent authorization layers before any chunk reaches the model. Failure of either layer blocks access without affecting the other.
- API Gateway · (JWT Authorizer)
- Auth Lambda · IsAuthorizedWithToken
- Amazon Verified · Permissions · (Cedar Policies)
- Bedrock · Knowledge Base · (Retrieve API)
- OpenSearch · Serverless · (k-NN + metadata filter)
- S3 Document Store · (tenant_id prefix isolation)
- Bedrock InvokeModel · (Claude / Titan)
- Grounded Response · (authorized context only)
- CloudTrail · (AVP decisions + KB calls)
- CloudWatch · (SLO: auth latency p99)
Implementation: Details That Matter in Production
There are five implementation details that most reference architectures omit and that make a real difference in financial production.
1. AVP entity synchronization. AVP requires an entity store consistent with the IdP. I use an async pipeline: Cognito User Pool triggers → EventBridge → Lambda → AVP BatchCreateOrUpdateEntities. The consistency SLA is eventual (~2-5s), which is acceptable for role changes, but requires the system to handle the case of a newly promoted user whose new role has not yet propagated. The solution is a 60s cache TTL in the authorization Lambda with fallback to re-fetch if AVP returns an unexpected DENY.
2. Document metadata in the ingestion pipeline. During Knowledge Base ingestion, each chunk must carry tenant_id, clearance_level_max, document_type, and created_at as structured metadata. This is done via a custom Lambda in the Glue pipeline that processes documents before sending them to Bedrock KB via StartIngestionJob. Missing or incorrect metadata is the primary failure vector in this pattern — I implement schema validation with AWS Glue Data Quality before ingestion.
3. Authorization retry idempotency. When AVP returns a 5xx error (rare, but it happens), the correct behavior is fail closed — deny access and log the error, not bypass. I implement this with a circuit breaker in the authorization Lambda: after 3 consecutive AVP failures in 30s, the circuit opens and all requests are denied with HTTP 503 until the circuit closes. This is counter-intuitive for product teams, but it is the correct behavior for financial systems.
4. Authorization layer observability. I instrument the authorization Lambda with OpenTelemetry, emitting spans for each AVP call with attributes avp.decision, avp.tenant_id, avp.policy_id, and avp.latency_ms. These spans feed a Datadog dashboard with SLOs: auth_latency_p99 < 30ms and auth_deny_rate_anomaly (alert if DENY rate increases more than 3σ in 5 minutes — indicator of attack or policy bug).
Cedar Policy Management at Scale: The Real Problem
The most underestimated part of this pattern is not the technical integration with AVP — it is the governance of the Cedar policy lifecycle. In an environment with 50 tenants and 200 distinct roles, you quickly accumulate hundreds of policies. Without engineering discipline, this becomes an authorization system that no one fully understands.
My approach is to treat Cedar policies as first-class Infrastructure as Code. Each policy lives in a Git repository with mandatory PR review by two security engineers. The CI pipeline uses cedar-policy-cli to validate syntax and run policy unit tests (yes, you can write unit tests for Cedar policies) before any merge. Deployment is done via CodePipeline with manual approval for changes affecting more than 10 entities.
A pattern I found to be critical is the explicit deny policy for unclassified documents. By default, Cedar denies everything not explicitly permitted — but documents without a defined clearance_level_max metadata field (due to ingestion failure) are in an ambiguous state. I add an explicit forbid policy: forbid(principal, action, resource) when { !resource.hasAttribute("clearance_level_max") };. This ensures that documents with corrupted metadata are never accessible, regardless of any other policy.
For the policy drift problem — when policies in AVP diverge from what is in Git — I implement a reconciliation Lambda that runs every 15 minutes, compares the hash of active policies in AVP with the expected state in S3 (pipeline artifact), and emits a CloudWatch alarm if there is divergence. This detects manual changes outside the pipeline, which is the primary misconfiguration vector in regulated environments.
Consequences and Risks That Must Be Managed
1. Eventual consistency is a real risk. AVP entity synchronization is asynchronous. An urgent access revocation (e.g., termination of an employee with access to confidential documents) can take up to 5s to propagate. For high-urgency cases, implement an immediate invalidation mechanism via Cognito User Pool (disable the user) that is checked before the AVP call.
2. The metadata filter is not cryptography. The metadata filter in OpenSearch Serverless is a query restriction, not a cryptographic control. An operator with direct OpenSearch Serverless access can bypass it. For maximum-sensitivity documents, consider field-level encryption with a per-tenant KMS CMK, where the decryption key is only released after a successful AVP authorization.
3. AVP cost at high volume. At $0.10 per 1000 requests, a system with 1M RAG requests/day generates $3,000/month in AVP authorization alone. At scale, evaluate decision caching with a short TTL (30-60s) for identical queries from the same user — but document this as a security trade-off, since a revocation will not be immediately effective for cached queries.
4. Prompt injection is still a threat. This pattern protects retrieval, but not the model itself. A malicious document that passes the filters (because it legitimately belongs to the tenant) can still attempt to manipulate the model via prompt injection. Add Bedrock Guardrails with BLOCK_PROMPT_ATTACK enabled and monitor the InvokeModel.GuardrailsApplied metric.
Reference Numbers for Sizing
Anti-Patterns I Frequently Observe in Multi-Tenant RAG
- Authorization only in the system prompt: 'You may only answer about tenant X documents' — LLMs are not enforcement boundaries, they are text processing actors
- Tenant ID extracted from request body (not from verified JWT) — allows trivial tenant spoofing
- Shared Knowledge Base without metadata filter — any application logic failure exposes cross-tenant data
- Cedar policies hardcoded in Lambda instead of managed in AVP Policy Store — impossible to audit, version, or revoke without redeploy
- No metadata validation in the ingestion pipeline — documents with missing tenant_id become accessible to all tenants if the filter fails
- Fail open on authorization errors — 'better to let through than block legitimate users' is the wrong reasoning in financial systems
AWS Well-Architected Framework Assessment
Security
Defense in depth with two independent layers (AVP + metadata filter). IAM least-privilege for Bedrock KB and AVP. KMS CMK for data at rest in S3 and OpenSearch. CloudTrail enabled for all AVP calls. Bedrock Guardrails for prompt injection. No tenant credentials in code — everything via verified JWT claims.
Reliability
Circuit breaker on AVP failures with fail-closed behavior. Multi-AZ OpenSearch Serverless by default. Retry with exponential jitter on Bedrock calls (max 3 attempts, 100ms base backoff). Policy drift alarm to detect misconfiguration outside the pipeline.
Performance efficiency
AVP p99 ~18ms — within latency budget. Optional decision cache (30-60s TTL) to reduce latency and cost. OpenSearch Serverless scales automatically without capacity management. Authorization Lambda configured with 512MB and Provisioned Concurrency to eliminate cold start during peak hours.
In practice, what concerns me most about this pattern is not the technical integration — it is the governance of Cedar policies over time. Teams that do not treat policies as first-class code invariably accumulate policy debt that becomes an audit risk. If I had to give a single piece of advice: implement Cedar policy unit tests on day zero, before any policy goes to production — the ability to test 'the junior analyst of tenant A should NOT see documents classified as CONFIDENTIAL' as an automated test is what separates an auditable authorization system from one you only discover is wrong during an incident. The hard-won lesson here is that the second defense layer (metadata filter) exists precisely for the day someone merges an incorrect Cedar policy without realizing it.
Verdict: Adopt the Defense-in-Depth Pattern, But Govern the Policies
For any multi-tenant RAG system in a regulated financial environment, the defense-in-depth pattern with Amazon Verified Permissions (Cedar) in Layer 1 and metadata filter in Bedrock Knowledge Base in Layer 2 is the correct approach. The independence of the layers is the most valuable property: it transforms misconfiguration events from catastrophic to detectable and contained incidents. The additional operational cost (~$3k/month in AVP at 1M req/day, reducible with caching) is justifiable in any environment where a cross-tenant data breach would have regulatory consequences. The real investment is not in infrastructure — it is in governance: Cedar policies as IaC, automated tests, CI/CD pipeline for policies, and drift monitoring. Without this, you have the right architecture with the wrong process, and the process is what fails in production.
References
Architecture, AWS, AI and market deep dives — straight to your inbox. Free.
No spam · unsubscribe anytime
Ask Fernando about this
Get a focused answer about this article from my AI assistant, grounded in my work.
Join the conversation
Sign in to comment
Verify your email to join in — you'll also get the newsletter. No password.
Keep reading
Architecture intelligence, in your inbox
Curated signals and original analysis on AWS, AI, distributed systems and the market — the way a solutions architect reads them.
- Curated AWS · AI · architecture · market signals
- New architecture studies & deep-dives when they ship
- Sharp summaries — depth without the noise
- No spam · double opt-in · unsubscribe anytime