Who is Fernando F. Azevedo?

Fernando F. Azevedo is a Senior Solutions Architect at Banco Itaú with 16+ years of experience across AWS, event-driven architecture, DevSecOps, Data Mesh, AI and financial systems.

What technical topics does Fernando work with?

Fernando works with AWS, Kubernetes, Kafka, Data Mesh, Amazon Bedrock, RAG, DevSecOps, observability, financial systems and architecture communication using C4, ADRs and trade-off analysis.

Is Fernando available for professional conversations?

Fernando is currently building at Banco Itaú and is open to thoughtful conversations about architecture, cloud, AI, engineering leadership, community, podcasts and technical collaboration.

AI & AgentsPattern Teardown

IaC for AI Platforms: Terraform and SageMaker Unified Studio

Jul 3, 2026 7 minadvanced AI-assisted

Listen to article

generated on play

Generated only on first play

On demand

0:000:00

Speed

The MP3 is saved to S3 after the first play.

AI & AgentsPattern Teardown

fernando.moretes.com

Official Terraform support for SageMaker Unified Studio closes a critical gap: AI platforms can now be provisioned with the same IaC rigor applied to networks and databases. In this article I dissect the pattern, its modular anatomy, when it solves the right problem, and when it conceals dangerous technical debt.

Provisioning an AI platform through console clicks is the architectural equivalent of configuring a production database over SSH. It works the first time, breaks the second, and nobody knows what changed. Official Terraform support for Amazon SageMaker Unified Studio, announced on July 2, 2026, is not just operational convenience — it is the minimum condition for platform teams to treat data and AI environments with the same engineering contract already applied to VPCs, EKS clusters and data pipelines. I will dissect the pattern, its real anatomy and the places where it fails silently.

The Real Problem: AI Platforms Outside the Engineering Contract

In regulated financial environments — where every provisioned resource requires an audit trail, change approval and rollback capability — the absence of IaC for AI platforms creates a second-class infrastructure category. MLOps teams build sophisticated pipelines with Step Functions, MSK and Glue, but the SageMaker domain hosting all of it was created manually by an administrator eighteen months ago. Nobody knows exactly which IAM roles were created, which blueprints are active, or whether the staging environment is truly identical to production.

This gap is not cosmetic. When a central bank or SEC auditor asks for evidence that the production credit-model environment was configured according to the approved security policy, the answer "the domain was created manually" is a non-conformance finding. SageMaker Unified Studio aggregates critical services — Amazon DataZone for catalog governance, Amazon EMR for distributed processing, Amazon Redshift for analytics, Amazon Bedrock for GenAI — and each of those services has security configurations that must be auditable and reproducible.

The terraform-aws-sagemaker-unified-studio module addresses this through a composable approach: a root module that provisions the domain with managed IAM roles, and independent sub-modules for blueprints, project profiles and projects. This separation of responsibilities is deliberate and important — it allows different teams to control different platform layers without unnecessary coupling in Terraform state.

Pattern Anatomy: Terraform Modules for SageMaker Unified Studio

Provisioning flow showing the modular composition of terraform-aws-sagemaker-unified-studio, from CI/CD pipeline to provisioned resources across multiple AWS accounts

🔧 IaC Pipeline

Git Repo · module source
CI/CD · (GitHub Actions · / CodePipeline)
S3 + DynamoDB · Remote State · + Lock

🏗️ Root Module

SageMaker · Unified Studio · Domain
IAM Roles · (provisioned · by module)
Cloud Control · API Provider

🧩 Sub-Modules

Blueprint · Sub-Module
Project Profile · Sub-Module
Project · Sub-Module

☁️ AWS Services (Domain)

Amazon · DataZone · Catalog
Amazon · Bedrock · GenAI
Amazon EMR · Distributed · Processing

Module Anatomy: Cloud Control API and the Cost of Abstraction

The most important technical detail in this launch is not the module itself, but the layer that enables it: the Terraform AWS Cloud Control Provider. Unlike the traditional hashicorp/aws provider, which maps AWS resources to Terraform resources with per-service custom logic, the Cloud Control Provider uses AWS's unified Cloud Control API to create, read, update and delete resources. This means new resource types become available in Terraform much faster — without waiting for HashiCorp to implement specific support.

The practical consequence is twofold. On the positive side, resources like the SageMaker Unified Studio domain, which have managed lifecycles and complex configurations, reach Terraform before the traditional provider supports them. On the negative side, the Cloud Control Provider has different drift detection semantics: it relies on the Cloud Control API's read handler, which for some resources returns only a subset of configurable attributes. This means terraform plan may report "no changes" even when configurations outside the handler's scope were manually altered in the console — exactly the kind of silent drift that destroys IaC reliability in financial environments.

To mitigate this in production, I combine the module with AWS Config Rules that detect drift in DataZone and SageMaker resources, and with CloudTrail + EventBridge to alert on any API call that modifies the domain outside the Terraform pipeline. The SLO here is not "zero drift" — it is "drift detected in under 15 minutes". This distinction matters: chasing zero drift via Terraform alone is more brittle than detecting and correcting drift quickly with proper observability.

When to Use This Pattern: Validity Conditions

This pattern solves a specific problem: provisioning and managing SageMaker Unified Studio domains as versioned, auditable and reproducible infrastructure across multiple accounts and environments. The conditions that make it the right choice are:

1. Multiple environments with required parity. If you have dev, staging and production for ML/AI workloads, and compliance requires that the production configuration be derivable from the same source of truth as staging, the Terraform module is the only approach that delivers this in an auditable way. The ability to create projects with existing IAM roles — rather than always creating new ones — is critical here: in financial environments, IAM roles go through a separate approval process and cannot be created ad-hoc by the platform pipeline.

2. Platform teams separate from product teams. The modular design of terraform-aws-sagemaker-unified-studio — domain in the root module, blueprints and project profiles in independent sub-modules — maps directly to the responsibility model where a Platform Engineering team controls the domain and IAM, while data teams control their own project profiles and projects. This is Data Mesh at the infrastructure level: data domains with controlled autonomy.

3. Need for documented disaster recovery. A manually created SageMaker Unified Studio domain has no defined RTO — nobody knows how long it takes to recreate everything from scratch. With the Terraform module, the environment recreation RTO is the time of terraform apply, which for a typical domain runs between 8 and 20 minutes depending on enabled blueprints. That is a real number that can enter your BIA (Business Impact Analysis).

The pattern is not the right choice when you have a single experimentation environment with no audit requirements, or when the team lacks sufficient Terraform maturity to safely manage remote state, modules and workspaces.

Anti-Patterns: Where This Pattern Fails Silently

Monolithic state per account. Putting the domain, all blueprints, all project profiles and all projects into a single terraform apply creates a massive blast radius. A change in one data team's project profile blocks the apply of a critical domain IAM update.
Ignoring silent drift from the Cloud Control Provider. As discussed, the Cloud Control API read handler does not cover all attributes. Relying solely on terraform plan for drift detection is insufficient. Without AWS Config Rules and CloudTrail alerting covering domain resources, you will have manually altered security configurations that Terraform does not detect — and that an auditor
Module-created IAM roles with excessive permissions in production. The module automatically provisions IAM roles, which is convenient but dangerous if accepted without review. In financial environments, every IAM role accessing sensitive data must go through a least-privilege review.
No credential separation between environments. Using the same AWS provider configured with the same credentials for dev and production across different Terraform modules is a classic blast radius mistake. Configure providers with assume_role for environment-specific roles, and use OIDC with GitHub Actions or CodeBuild to eliminate long-lived credentials from the pipeline.
Blueprints enabled without cost baseline. Each blueprint enabled in SageMaker Unified Studio can provision underlying resources (EMR clusters, Redshift endpoints, Bedrock connections) with ongoing cost. Enabling all available blueprints in dev "to experiment" and forgetting to destroy them is a real cost vector.

Reference Design: Multi-Account AI Platform with Financial Governance

For a bank or asset manager that needs to operate credit models, fraud detection and portfolio analysis in a regulated environment, the reference design I would apply combines the Terraform module with an AWS Organizations account strategy and layered governance controls.

Account structure: Platform tooling account (where remote Terraform state lives in S3 with versioning and KMS CMK, and the DynamoDB lock table uses BillingMode: PAY_PER_REQUEST with point-in-time recovery enabled), AI/ML production account, staging account and development account. The CI/CD pipeline assumes roles via OIDC in each account — never uses access keys.

Terraform state layers: Layer 0 — SageMaker Unified Studio domain + IAM roles (managed by Platform Engineering team, apply requires manual approval in pipeline). Layer 1 — enabled blueprints (managed by Platform Engineering team with cost review). Layer 2 — project profiles (managed by tech leads of each data domain). Layer 3 — individual projects (managed by product teams with controlled self-service).

Specific security controls: Dedicated KMS CMK per environment for domain data encryption, with kms:ViaService condition restricting use to SageMaker and DataZone. SCPs in AWS Organizations blocking manual SageMaker domain creation outside the pipeline (Deny on sagemaker:CreateDomain except for the Terraform pipeline role). Custom AWS Config Rule verifying that all SageMaker domains have the ManagedBy: terraform tag — any domain without this tag triggers a P1 alert.

Observability: CloudWatch Dashboard with aws/sagemaker/unified-studio namespace metrics, CloudTrail Lake with saved query to detect manual domain modifications, and cost per project via Cost Allocation Tags mapped in project sub-modules. This last point is frequently overlooked: the Terraform module is the right place to ensure cost allocation tags are applied consistently across all domain resources.

Well-Architected Lenses for this Pattern

Security

Use create_iam_roles = false in production and supply pre-approved roles. Apply KMS CMK with kms:ViaService condition. Block manual domain creation via SCP. Enable CloudTrail for all domain API calls.

Reliability

Separate Terraform state by layer to reduce blast radius. Enable S3 versioning and DynamoDB PITR for remote state. Document domain recreation RTO as a platform reliability metric.

Performance efficiency

Enable only necessary blueprints per environment — unnecessary blueprints increase apply time and cost of underlying resources. Measure terraform apply time per layer as a platform SLI.

State Separation Matters More than DRY

The temptation with well-structured Terraform modules is to consolidate everything into a single apply to "simplify". Resist. In AI platforms with multiple teams, the SageMaker domain and individual projects have completely different lifecycles — the domain changes rarely and with high impact, projects change frequently and with isolated impact. Separating into distinct state files is not bureaucracy: it is the difference between a 30-second terraform apply to create a project and a 20-minute one that can destroy the entire domain if something goes wrong.

Provisioning Approaches: Terraform vs. Alternatives

	Criterion	Terraform + official module	AWS CDK	Manual CloudFormation	Manual console
Auditability	✅ Git history + state	✅ Git history + CDK context	⚠️ Stack events, no readable diff	❌ CloudTrail only	—
Drift detection	⚠️ Partial via Cloud Control API	⚠️ Depends on L1 construct	✅ Native CloudFormation drift detection	❌ None	—
Integration with existing pipelines	✅ Mature Terraform ecosystem	⚠️ Requires Node.js in pipeline	✅ AWS native	❌ Not applicable	—
Multi-cloud / portability	✅ Portable Terraform state	⚠️ AWS-centric	❌ AWS only	❌ AWS only	—
Learning curve for data teams	⚠️ Moderate HCL	✅ Familiar language (Python/TypeScript)	❌ Verbose YAML/JSON	✅ Familiar UI	—

My Curation Note

Senior Solutions Architect

I have applied the IaC pattern for data platforms in environments where the cost of a misconfigured domain is not technical — it is regulatory. What I learned in practice is that the Terraform module is a necessary but not sufficient condition: without SCPs blocking manual creation and without AWS Config Rules covering domain resources, teams will create domains outside the pipeline "just to test" and those domains will survive to production. The second point I would emphasize: state separation by layer is not optional in teams with more than two squads using the platform — it is the only way to give autonomy without creating deploy coupling. Finally, support for existing_iam_roles is the most important feature of this module for financial environments: without it, the module would be unusable in any organization with an IAM approval process separate from the platform pipeline.

Verdict: Adopt, with Explicit Governance Controls

Adopt with controls

The terraform-aws-sagemaker-unified-studio module is the correct approach for any organization operating SageMaker Unified Studio in regulated environments or with multiple teams. The composable modular design solves the real problem of responsibility separation between Platform Engineering and product teams. The Cloud Control Provider dependency is a known limitation that requires compensating drift detection controls — it is not a reason to reject the pattern, but it is a reason not to blindly trust terraform plan as the sole source of truth about domain state. For financial environments: use existing_iam_roles in production, separate state by layer, block manual creation via SCP, and instrument CloudTrail + AWS Config for drift coverage. With these controls, this pattern delivers what it promises: AI platforms with the same engineering rigor you already apply to the rest of your infrastructure.

References

AWS What's New: Amazon SageMaker Unified Studio now supports Terraform for provisioning (Jul 2, 2026)GitHub: terraform-aws-sagemaker-unified-studio (aws-ia)AWS Blog: Quickly adopt new AWS features with the Terraform AWS Cloud Control Provider AWS Blog: Deploy Amazon SageMaker Projects with Terraform Cloud (May 2025)AWS Blog: Amazon SageMaker Domain in VPC only mode with Terraform (Sep 2023)AWS Docs: SageMaker Unified Studio Administrator Guide — Supported Regions AWS Docs: AWS Control Tower AFT Provisioning Framework AWS Architecture Center: AWS Well-Architected Framework

#terraform#sagemaker#iac#mlops#data-governance#platform-engineering#finops#devsecops

Liked this? Get the next one.

Architecture, AWS, AI and market deep dives — straight to your inbox. Free.

No spam · unsubscribe anytime

Analyzed source: Amazon SageMaker Unified Studio now supports Terraform for provisioning

Ask Fernando about this

Get a focused answer about this article from my AI assistant, grounded in my work.

Join the conversation

Verify your email to join in — you'll also get the newsletter. No password.

Keep reading

AI & AgentsAmazon Bedrock AgentCore: Continuous Agent Optimization in ProductionAmazon Bedrock AgentCore introduces a continuous improvement loop that turns production traces into actionable diagnostics, data-grounded recommendations, and statistical validation via A/B testing. For architects of financial systems and high-stakes platforms, this represents AWS's first serious attempt to close the gap between agent observability and reliable production operation.Read AI & AgentsAgent Evaluation as an Engineering DisciplineAI agent evaluation has moved beyond ad hoc prompt engineering into a full engineering discipline with versioned datasets, automated quality gates, and regression traceability. Bedrock AgentCore materializes that shift by bringing managed infrastructure to the agent testing lifecycle. For financial-grade systems architects, this changes the contract between ML teams and platform engineering.Read AI & AgentsADR: Scaling Agents to Production with AgentCore Runtime QuotasIn July 2026, AWS raised AgentCore Runtime default limits to 5,000 active concurrent sessions in us-east-1/us-west-2 and 200 interactions per second across all regions. This ADR documents the context that forced this design decision, the architectural options I evaluated for financial-grade agentic systems at scale, and the operational consequences you must plan for before putting agents into production.Read

Architecture newsletter

Architecture intelligence, in your inbox

Curated signals and original analysis on AWS, AI, distributed systems and the market — the way a solutions architect reads them.

Curated AWS · AI · architecture · market signals
New architecture studies & deep-dives when they ship
Sharp summaries — depth without the noise
No spam · double opt-in · unsubscribe anytime

AI & AgentsPattern Teardown

IaC for AI Platforms: Terraform and SageMaker Unified Studio

Jul 3, 2026 7 minadvanced AI-assisted

Listen to article

generated on play

Generated only on first play

On demand

0:000:00

Speed

The MP3 is saved to S3 after the first play.

AI & AgentsPattern Teardown

fernando.moretes.com

The Real Problem: AI Platforms Outside the Engineering Contract

Pattern Anatomy: Terraform Modules for SageMaker Unified Studio

Provisioning flow showing the modular composition of terraform-aws-sagemaker-unified-studio, from CI/CD pipeline to provisioned resources across multiple AWS accounts

🔧 IaC Pipeline

Git Repo · module source
CI/CD · (GitHub Actions · / CodePipeline)
S3 + DynamoDB · Remote State · + Lock

🏗️ Root Module

SageMaker · Unified Studio · Domain
IAM Roles · (provisioned · by module)
Cloud Control · API Provider

🧩 Sub-Modules

Blueprint · Sub-Module
Project Profile · Sub-Module
Project · Sub-Module

☁️ AWS Services (Domain)

Amazon · DataZone · Catalog
Amazon · Bedrock · GenAI
Amazon EMR · Distributed · Processing

Module Anatomy: Cloud Control API and the Cost of Abstraction

When to Use This Pattern: Validity Conditions

Anti-Patterns: Where This Pattern Fails Silently

Monolithic state per account. Putting the domain, all blueprints, all project profiles and all projects into a single terraform apply creates a massive blast radius. A change in one data team's project profile blocks the apply of a critical domain IAM update.
Ignoring silent drift from the Cloud Control Provider. As discussed, the Cloud Control API read handler does not cover all attributes. Relying solely on terraform plan for drift detection is insufficient. Without AWS Config Rules and CloudTrail alerting covering domain resources, you will have manually altered security configurations that Terraform does not detect — and that an auditor
Module-created IAM roles with excessive permissions in production. The module automatically provisions IAM roles, which is convenient but dangerous if accepted without review. In financial environments, every IAM role accessing sensitive data must go through a least-privilege review.
No credential separation between environments. Using the same AWS provider configured with the same credentials for dev and production across different Terraform modules is a classic blast radius mistake. Configure providers with assume_role for environment-specific roles, and use OIDC with GitHub Actions or CodeBuild to eliminate long-lived credentials from the pipeline.
Blueprints enabled without cost baseline. Each blueprint enabled in SageMaker Unified Studio can provision underlying resources (EMR clusters, Redshift endpoints, Bedrock connections) with ongoing cost. Enabling all available blueprints in dev "to experiment" and forgetting to destroy them is a real cost vector.

Reference Design: Multi-Account AI Platform with Financial Governance

Well-Architected Lenses for this Pattern

Security

Reliability

Separate Terraform state by layer to reduce blast radius. Enable S3 versioning and DynamoDB PITR for remote state. Document domain recreation RTO as a platform reliability metric.

Performance efficiency

Enable only necessary blueprints per environment — unnecessary blueprints increase apply time and cost of underlying resources. Measure terraform apply time per layer as a platform SLI.

State Separation Matters More than DRY

Provisioning Approaches: Terraform vs. Alternatives

	Criterion	Terraform + official module	AWS CDK	Manual CloudFormation	Manual console
Auditability	✅ Git history + state	✅ Git history + CDK context	⚠️ Stack events, no readable diff	❌ CloudTrail only	—
Drift detection	⚠️ Partial via Cloud Control API	⚠️ Depends on L1 construct	✅ Native CloudFormation drift detection	❌ None	—
Integration with existing pipelines	✅ Mature Terraform ecosystem	⚠️ Requires Node.js in pipeline	✅ AWS native	❌ Not applicable	—
Multi-cloud / portability	✅ Portable Terraform state	⚠️ AWS-centric	❌ AWS only	❌ AWS only	—
Learning curve for data teams	⚠️ Moderate HCL	✅ Familiar language (Python/TypeScript)	❌ Verbose YAML/JSON	✅ Familiar UI	—

My Curation Note

Senior Solutions Architect

Verdict: Adopt, with Explicit Governance Controls

Adopt with controls

References

#terraform#sagemaker#iac#mlops#data-governance#platform-engineering#finops#devsecops

Liked this? Get the next one.

Architecture, AWS, AI and market deep dives — straight to your inbox. Free.

No spam · unsubscribe anytime

Analyzed source: Amazon SageMaker Unified Studio now supports Terraform for provisioning

Ask Fernando about this

Get a focused answer about this article from my AI assistant, grounded in my work.

Join the conversation

Verify your email to join in — you'll also get the newsletter. No password.

Keep reading

Architecture newsletter

Architecture intelligence, in your inbox

Curated signals and original analysis on AWS, AI, distributed systems and the market — the way a solutions architect reads them.

Curated AWS · AI · architecture · market signals
New architecture studies & deep-dives when they ship
Sharp summaries — depth without the noise
No spam · double opt-in · unsubscribe anytime