Playbook: Which AWS AI Service to Use — The Decision Tree
Listen to study
generated on playGenerated only on first play
Powered by Amazon Polly + OmniVoice
Bedrock, SageMaker, Amazon Q, AgentCore, and self-hosted GPU solve different problems — but hype pushes everyone toward Bedrock by default. This playbook delivers a decision tree, a trade-off matrix, and rules of thumb so you choose by the problem, not the trend.
The wrong question is 'which AWS AI service is best?' The right question is 'what problem am I solving?' Five services, five distinct contexts — and the wrong choice costs months of rework, surprise bills, or a system that never needed to be built from scratch.
What you'll be able to decide after this playbook
Quick Reference — Services in Scope
- Services covered
- Amazon Bedrock, SageMaker AI, Amazon Q, Bedrock AgentCore, Self-hosted (EKS/EC2 GPU)
- Domain
- Generative AI and ML on AWS
- Bedrock pricing model
- Pay-per-token (on-demand) or Provisioned Throughput (reserved capacity)
- SageMaker pricing model
- Per instance-hour (training + inference) + S3/EBS storage
- Amazon Q Business pricing model
- Per user/month (Lite ~$3, Pro ~$20 — verify current pricing)
- Bedrock AgentCore
- GA 2025; managed runtime for agents with tools, memory, gateway, and native observability
- Self-hosted GPU
- EC2 p4d/p5/g5 or EKS + Karpenter; high fixed cost, full control
The mental model that unlocks everything: abstraction layers vs. control layers
Think of the five services as a single axis: on one end, maximum abstraction and delivery speed; on the other, maximum control and cost efficiency at scale. No point on the axis is better than another — each is optimal for a maturity stage and a set of constraints.
Amazon Q sits at the abstraction extreme: you don't write AI code, don't manage models, don't think about tokens. You connect data sources, configure permissions, and deliver productivity. It's the right service when the problem is access to corporate knowledge, not building an AI system.
Amazon Bedrock is the next step down: you consume foundation models via API (Anthropic Claude, Meta Llama, Mistral, Amazon Titan, Cohere, Stability AI, and others) without managing inference infrastructure. The contract is simple — you send tokens, receive tokens, pay per token. It's the correct starting point for most LLM production use cases.
Bedrock AgentCore is the runtime layer for when plain Bedrock isn't enough: when your agent needs persistent memory across sessions, tool orchestration with retry and timeout, a security gateway for model calls, and structured observability (traces, spans, latency metrics per step). Without AgentCore, you build all of this by hand — and you invariably build it wrong the first time.
SageMaker AI is where you go when the model you need doesn't exist in the Bedrock catalog, or when you need fine-tuning with your proprietary data, or when managed inference latency doesn't meet your SLA. It's a complete ML platform — experiments, pipelines, feature store, model registry, inference endpoints. Powerful, but expensive in engineering time.
Self-hosted EKS + GPU is the control extreme: you operate the cluster, manage CUDA drivers, configure GPU node autoscaling, monitor VRAM utilization. Cost per token can be a fraction of Bedrock at high volumes, but operational cost and availability risk are entirely yours. It makes sense when you have a specific model not in Bedrock, regulatory constraints preventing data from leaving your VPC, or volume so high that pay-per-token becomes prohibitive.
Why the 'let's go Bedrock' default is a dangerous shortcut
Bedrock is excellent — but it became the 'React' of the AWS AI world: everyone uses it regardless of the problem. I see three recurring error patterns in real projects:
Error 1 — Bedrock for internal productivity that Q already solves. A team spends weeks building a custom RAG over internal documentation, integrating with Kendra or OpenSearch, managing chunking, embeddings, and retrieval. Amazon Q Business does exactly this with native connectors for S3, SharePoint, Confluence, Salesforce, and 40+ other sources — with permission control based on IAM and Active Directory groups. The build cost isn't justified when the product already exists.
Error 2 — Plain Bedrock for complex agents. You start with a simple agent on Bedrock — one tool, one prompt, it works. Then you add session memory (you implement in DynamoDB), then retry on failing tools (you implement in the application), then observability (you instrument with X-Ray manually), then rate limiting on the gateway (you implement in API Gateway). In six months you've hand-built AgentCore, with bugs the managed service already fixed. AgentCore exists to prevent exactly this plumbing accumulation.
Error 3 — Self-hosted before validating volume. Teams that estimate Bedrock cost at scale and conclude 'it'll get expensive' jump straight to EKS + GPU without having validated the product. The operational cost of a production GPU cluster — with high availability, model updates, VRAM monitoring, spot interruption management — is real and ongoing. The rule: validate the product on Bedrock, migrate to self-hosted when the monthly Bedrock bill justifies the operational investment.
When SageMaker is non-negotiable
SageMaker is frequently underestimated in GenAI discussions because recent focus is on foundation LLMs. But there are four scenarios where it's the only correct answer:
1. Fine-tuning with sensitive proprietary data. If you need to adapt a model with data that cannot leave your AWS account — medical, financial, legal data — SageMaker offers training within your VPC, with encryption at rest and in transit, without data flowing to third-party infrastructure. Bedrock offers fine-tuning for some models, but with less control over the execution environment.
2. Models not in the Bedrock catalog. If your use case requires a specific model — a specialized domain LLM, a custom computer vision model, a time-series forecasting model — SageMaker is where you train and serve. JumpStart accelerates the starting point with pre-trained models from HuggingFace and other repositories.
3. Traditional ML pipelines with structured features. For XGBoost models, neural networks on tabular data, recommendation models — SageMaker Pipelines, Feature Store, and Model Monitor form a cohesive platform that Bedrock simply doesn't cover.
4. Inference latency with aggressive SLA. A dedicated SageMaker endpoint (non-serverless) offers predictable P99 latency because you control the instance. Bedrock on-demand has latency variability that may be unacceptable for cases like real-time scoring of financial transactions. Bedrock Provisioned Throughput mitigates this, but at a fixed cost that often makes SageMaker competitive.
The SageMaker trap is operational overhead: you manage endpoints, monitor drift, update models, configure auto-scaling. For teams without mature MLOps, this operational cost is systematically underestimated.
Head-to-Head Comparison: 5 Services Across 6 Dimensions
| Dimension | Bedrock | AgentCore | SageMaker AI | Amazon Q | |
|---|---|---|---|---|---|
| Best for | Consuming LLMs/embeddings via API; fast prototype to production | Production agents with tools, memory, gateway, and observability | Training/fine-tuning custom models; traditional ML; inference with aggressive SLA | Internal productivity: Q&A on corporate docs, code assistant, AWS assistant | High volume with prohibitive per-token cost; full data sovereignty; models not available in Bedrock |
| Ops effort | Low — no infrastructure to manage | Low-medium — managed runtime, but you configure tools and memory | High — endpoints, scaling, drift monitoring, model updates | Very low — connector and permission setup; no AI code | Very high — GPU cluster, CUDA, autoscaling, availability, security |
| Cost model | Pay-per-token (variable) or Provisioned Throughput (fixed per hour) | Pay-per-use runtime + cost of underlying Bedrock model | Per instance-hour (training + inference) + storage | Per user/month (predictable, SaaS-like) | Fixed GPU instance cost (high) + ops; better TCO at very high volume |
| Model control | Low — you use the model as-is; fine-tuning available for some models | Low — control over orchestration, not the model | High — you train, version, monitor, and replace the model | None — model managed by AWS | Total — you choose, operate, and update any model |
| When NOT to use | When you need a complex agent (use AgentCore); when Q already solves it; when volume makes per-token cost prohibitive | For simple LLM calls without orchestration; for internal productivity (use Q) | For consuming foundation models without customization (use Bedrock); for internal productivity (use Q) | For custom AI systems facing external customers; when you need control over the model or orchestration flow | Before validating the product and volume; when the team lacks capacity to operate GPU infrastructure |
| Recommended team maturity | Any team with AWS and API experience | Team with distributed systems and agent design experience | Team with MLOps or willingness to build that capability | Any team; ideal for teams without ML engineers | Senior team with Kubernetes, GPU ops, and cluster security experience |
Decision Matrix: Pros, Cons, and Verdict per Service
Amazon Bedrock
- Wide model catalog (Claude, Llama, Mistral, Titan, Cohere, Stability AI)
- Zero infrastructure to manage; automatic scaling
- Native guardrails, VPC endpoints, integrated CloudTrail
- Correct starting point for 80% of LLM use cases
- Per-token cost scales linearly — no automatic economies of scale
- Variable latency on on-demand; Provisioned Throughput has high fixed cost
- Limited model control; fine-tuning available for few models
Correct default for most projects. Start here.
Bedrock AgentCore
- Managed runtime for agents: tools, memory, gateway, observability out-of-the-box
- Eliminates plumbing you'd build manually (and incorrectly) with plain Bedrock
- Native integration with Bedrock ecosystem (models, guardrails, Knowledge Bases)
- Relatively new service (GA 2025) — fewer documented production use cases
- Adds runtime cost on top of Bedrock model cost
- Abstraction may limit very specific orchestration customizations
Use when your agent has more than one tool or needs memory. Don't reinvent AgentCore.
Amazon SageMaker AI
- Complete ML platform: training, feature store, pipelines, model registry, inference
- Full control over the model and execution environment
- Predictable inference latency with dedicated endpoints
- Fine-tuning with data that stays entirely in your account
- High operational overhead — you manage endpoints, scaling, drift, updates
- Steep learning curve for teams without MLOps
- Instance cost runs even when there's no traffic (dedicated endpoints)
Non-negotiable for custom models and traditional ML. Don't use to consume foundation LLMs.
Amazon Q
- Zero AI code — native connectors for 40+ corporate data sources
- Access control based on IAM and AD groups — user only sees what they have permission for
- Amazon Q Developer: code assistant integrated into IDE and AWS console
- Predictable cost per user/month — easy to justify to HR/finance
- No AI flow customization — you accept the product as-is
- Not suitable for AI systems facing external customers
- Dependency on available connectors — very custom sources require development
Vastly underused. If the problem is internal productivity, Q probably already solves it.
Decision Tree: Which AWS AI Service to Use
Each node is a qualification question. Follow the edges to the leaf — the recommended service. Questions are ordered by specificity: from most restrictive (custom training) to most general (LLM consumption).
- Novo caso · de uso de IA · New AI use case
- Q1: Precisa treinar · ou fine-tunar · modelo próprio? · Need to train/fine-tune · custom model?
- Q2: Caso de uso é · produtividade interna · (docs, código, console)? · Internal productivity · use case?
- Q3: Agente com tools, · memória ou · orquestração complexa? · Agent with tools, · memory, or complex · orchestration?
- Q4: Volume alto + · soberania de dados · ou custo por token · proibitivo? · High volume + data · sovereignty or · prohibitive token cost?
- SageMaker AI · Treinar · Servir · MLOps · Train · Serve · MLOps
- Amazon Q · Business / Developer · Produtividade interna · Internal productivity
- Bedrock AgentCore · Runtime de agentes · Agent runtime
- Self-hosted · EKS + GPU · Controle total · Full control
- Amazon Bedrock · API de LLM/Embeddings · LLM/Embeddings API · (default correto · correct default)
Qualification Checklist: 7 Questions Before Choosing the Service
- 1
1. Does the model I need exist in the Bedrock catalog?
Check at aws.amazon.com/bedrock. If it doesn't exist and you can't train an alternative, go to SageMaker or self-hosted. If it exists, continue.
- 2
2. Is the use case internal or external?
Internal (employees, devs, operations): evaluate Amazon Q before building anything. External (customers, partners, public APIs): Q doesn't apply — continue qualification.
- 3
3. Do I need fine-tuning with data that can't leave my account?
If yes: SageMaker is the safest option. Bedrock offers fine-tuning for some models, but with less environment control. Document the sovereignty requirement before deciding.
- 4
4. Does the system have multi-tool orchestration or persistent memory?
If yes: evaluate Bedrock AgentCore before building custom orchestration. Test whether AgentCore's abstractions meet your case — in most cases, they do.
- 5
5. What is the estimated token volume per month?
Calculate the monthly cost on Bedrock with the target model (aws.amazon.com/bedrock/pricing). If the cost is acceptable, use Bedrock. If prohibitive, compare with self-hosted TCO (GPU instance + ops + eng). Only migrate if the delta justifies it.
- 6
6. Are there regulatory data sovereignty constraints (LGPD, GDPR, financial sector)?
Bedrock operates within your AWS region and doesn't use your data to train models (by default). For harder constraints (data that cannot leave the VPC under any circumstances), self-hosted or SageMaker with VPC endpoint are the options.
- 7
7. Does the team have the capacity to operate the chosen service?
Be honest. Self-hosted without a senior platform team is guaranteed technical debt. SageMaker without MLOps is an orphaned endpoint in 6 months. Choose the service the team can operate with excellence, not the one that sounds most sophisticated.
Anti-patterns I repeatedly see in production
1. 'Let's go Bedrock' as an unreflective default. Bedrock is the correct starting point for LLMs — but not for internal productivity (use Q), not for complex agents without AgentCore (you'll reinvent the runtime), and not for traditional ML (use SageMaker). The question isn't 'Bedrock or not?' — it's 'what problem am I solving?' 2. Building custom RAG when Q already solves it. I've seen teams spend 6+ weeks building ingestion, chunking, embedding, and retrieval pipelines for internal documentation — when Amazon Q Business with an S3 or SharePoint connector would have delivered in days. Before building RAG, answer: 'Does Q Business cover this case?' Most of the time, it does. 3. Self-hosted before validating volume and product. Teams that estimate Bedrock cost at scale and jump to EKS + GPU without having validated that the product has traction. The cost of operating a GPU cluster in production — HA, updates, security, CUDA — is systematically underestimated. Validate the product on Bedrock. Migrate when the monthly bill justifies the operational investment. 4. Ignoring AgentCore and building orchestration by hand. Session memory in DynamoDB, tool retry in Lambda, rate limiting in API Gateway, manual traces in X-Ray — you're hand-building AgentCore, with bugs the managed service already fixed. If your agent has more than one tool, evaluate AgentCore before writing plumbing code.
Rule of Thumb
Start on Bedrock. Move up to AgentCore when it becomes an agent. Go to SageMaker or self-hosted only when cost or control justify it with real numbers. And before anything else: if the problem is internal productivity, ask whether Q already solves it — in most cases, it does.
After 16 years building production systems — including financial platforms where cost, latency, and data sovereignty are hard constraints — what surprises me most in the generative AI space is how quickly teams jump to the most complex solution without qualifying the problem. Amazon Q Business is the most underused service I know on AWS today. Most companies have an internal knowledge access problem — scattered documentation, outdated wikis, slow onboarding — and the instinctive answer is 'let's build a RAG with Bedrock'. Q solves this with native connectors, granular permission control, and zero AI code. The opportunity cost of not evaluating Q before building is real. At the other extreme, I see teams reaching for self-hosted before having 1000 active users. The argument is always 'per-token cost will scale'. Yes, it will — but the cost of operating a GPU cluster with high availability, model updates, cluster security, and spot interruption management also scales, and it's a fixed cost you pay regardless of traffic. Do the break-even calculation honestly before committing the team to GPU infrastructure operations. Bedrock AgentCore is the addition that most changes the equation for teams building agents in 2025. Before it, the choice was between plain Bedrock (you build the runtime) and external frameworks like LangChain or LlamaIndex (you manage dependencies and versions). AgentCore solves the managed runtime with native observability — and for financial or regulated systems, the traceability of each agent step is not optional. My practical recommendation: use the decision tree in this playbook as a checklist in every AI project inception meeting. The five qualification questions take less than 10 minutes and prevent months of rework.
Verdict
There is no universally correct AWS AI service — there is the right service for the right problem, operated by the right team. The decision tree is simple: internal productivity goes to Q; LLM via API goes to Bedrock; agent with orchestration goes to AgentCore; custom model goes to SageMaker; extreme volume with sovereignty goes to self-hosted. The mistake isn't choosing the wrong service out of naivety — it's choosing by hype, without asking the five qualification questions. Choose by the problem. The right service is the one your team can operate with excellence and that solves the user's problem at the lowest total cost — not the most sophisticated, not the newest, not the one that appeared at the last re:Invent.
Post-mortems, ADRs and architecture deep dives in your inbox — the way an architect reads them.
No spam · unsubscribe anytime
Ask Fernando about this
Get a focused answer about this study from my AI assistant, grounded in my work.
Join the conversation
Sign in to comment
Verify your email to join in — you'll also get the newsletter. No password.