Who is Fernando F. Azevedo?

Fernando F. Azevedo is a Senior Solutions Architect at Banco Itaú with 16+ years of experience across AWS, event-driven architecture, DevSecOps, Data Mesh, AI and financial systems.

What technical topics does Fernando work with?

Fernando works with AWS, Kubernetes, Kafka, Data Mesh, Amazon Bedrock, RAG, DevSecOps, observability, financial systems and architecture communication using C4, ADRs and trade-off analysis.

Is Fernando available for professional conversations?

Fernando is currently building at Banco Itaú and is open to thoughtful conversations about architecture, cloud, AI, engineering leadership, community, podcasts and technical collaboration.

AI & AgentsMigration Story

SageMaker Unified Studio via Terraform: Migrating to IaC in Financial-Grade Environments

Jul 4, 2026 12 minadvanced AI-assisted

Listen to article

Fernando's voice

Fernando · 24:18

Download MP3

0:0024:18

Speed

The MP3 is saved to S3 after the first play.

AI & AgentsMigration Story

~4h

Average time to provision new domain (before: manual)

Includes documentation, security review, and manual validation

~18min

Average time to provision new domain (after: Terraform pipeline)

Includes CI gates, plan review, and apply via Cloud Control API

Configuration deviations detected between dev/staging/prod after IaC

Versus multiple silent deviations in the previous state

fernando.moretes.com

Terraform support for Amazon SageMaker Unified Studio, announced in July 2026, closes a critical gap for data platforms in regulated environments: ML domains that previously required ClickOps or brittle SDK automation can now be versioned, reviewed, and promoted like any other infrastructure resource. In this article, I analyze the migration journey from a console-driven initial state to a full IaC pipeline, with particular attention to the IAM pitfalls, blueprint governance, and operational observability that make the difference in financial-grade environments.

Before this launch, provisioning a SageMaker Unified Studio domain across multiple AWS accounts was an exercise in manual discipline or brittle SDK automation. Now, with the open-source terraform-aws-sagemaker-unified-studio module and integration via the Terraform AWS Cloud Control Provider, the full lifecycle — domain, blueprints, project profiles, and projects — enters the same GitOps pipeline that governs the rest of your data platform. For teams in financial environments, this is not convenience: it is an audit prerequisite.

The Starting Point: The Real Cost of ClickOps in ML Platforms

Before any migration, you need to be honest about the starting state. In financial organizations I have worked with, the most common pattern for SageMaker Studio — and, by extension, for Unified Studio in its early months — was a combination of manual console work for the main domain, Python scripts via boto3 for user configurations, and runbook documentation that lagged reality by weeks. This state is not negligence: it is the natural result of a platform that evolved faster than IaC support.

The concrete problem is not aesthetic. In an environment regulated by SOX, PCI-DSS, or BACEN, every domain configuration — which execution roles were assigned, which blueprints are active, which project profiles exist — needs to be traceable to a change ticket, reviewed by a second pair of eyes, and retroactively auditable. A domain created via the console has none of these properties by default. The change history lives in CloudTrail, but reconstructing why an IAM role was added to a project profile six months ago requires correlating CloudTrail events with JIRA tickets manually — a process that fails exactly when you need it most: during a security incident or an external audit.

Furthermore, domain proliferation across multiple accounts (dev, staging, prod, scientist sandboxes) without IaC creates silent drift. The production account has a Glue blueprint enabled that staging does not. The ML project profile in prod uses a different KMS key than what is documented. These deviations only surface when they cause problems — and in financial environments, problems carry regulatory cost, not just operational cost.

The Migration Journey: Six Steps with Real Decisions

1
Step 1 — Inventory and State Import
The first step is the most laborious and most underestimated: importing existing state into Terraform without destroying and recreating resources. The terraform-aws-sagemaker-unified-studio module operates via the Terraform AWS Cloud Control Provider, meaning resources are addressed by ARN in the awscc_sagemaker_domain format. Run terraform import awscc_sagemaker_domain.<local_name> <domain_id> for each existing domain. Before running any terraform apply, generate the plan and validate that no immutable property (such as domain_execution_role_arn or vpc_id) will be modified — a change to these properties forces resource replacement, which in production means downtime and loss of existing projects.
2
Step 2 — Module Structure and Separation of Concerns
The open-source module exposes independent sub-modules: domain, blueprint, project-profile, and project. Map this hierarchy to ownership layers in your organization. In financial environments, the domain and blueprints are the responsibility of the platform team (Platform Engineering), while project profiles and individual projects are the responsibility of product teams (Data Science, Analytics). This translates to separate Terraform repositories with isolated state backends in S3 with DynamoDB locking — one per layer, one per AWS account. Never place production and development state in the same S3 bucket; use path prefixes with account ID and apply bucket policies with aws:SourceAccount condition to prevent accidental cross-account access.
3
Step 3 — IAM: Existing Roles vs. Module-Provisioned Roles
The module supports two modes: automatically provisioning new IAM roles or accepting existing roles via existing_iam_roles. In financial environments with centralized IAM (typically managed by a separate security team via AWS Organizations SCPs), the second mode is almost always mandatory. Domain execution roles need specific permissions: sagemaker:* scoped to the domain, glue:GetDatabase, glue:GetTable for the shared catalog, kms:Decrypt and kms:GenerateDataKey for the domain KMS key, and s3:GetObject/s3:PutObject on the artifact bucket with aws:ResourceAccount condition. Document each permission with justification in an ADR — SOX auditors will ask about the least-privilege principle for every role.
4
Step 4 — CI/CD Pipeline with Security Validation
Integrate the module into a GitOps pipeline with the following mandatory gates before any production apply: (1) terraform validate and terraform fmt -check on PR; (2) tfsec or checkov to detect insecure configurations — especially missing encryption_key_arn on the domain and disabled vpc_only_mode; (3) terraform plan with output saved as an artifact and mandatory human review for production resource changes; (4) conftest with OPA policies to validate that enabled blueprints match the list approved by the security team. In multi-account environments, use AWS CodePipeline with cross-account assume-role or GitHub Actions with OIDC federation — never static credentials in CI.
5
Step 5 — Environment Promotion and Drift Detection
The promise of IaC is consistency across dev, staging, and prod. In practice, this requires a workspace or per-environment repository strategy with isolated environment variables. Use terraform workspace only for lightweight configuration differences (such as instance counts); for structural differences (different blueprints per environment, different KMS keys), prefer separate repositories with a shared root module. Configure a scheduled terraform plan job (daily) in each account to detect drift — any non-empty output indicates someone modified the domain outside the pipeline. Integrate the output of this job into CloudWatch Logs and create an alarm for production deviations.
6
Step 6 — Blueprint and Project Profile Governance as Code
Blueprints in SageMaker Unified Studio define the capabilities available to projects — Glue, EMR, Bedrock, SageMaker Pipelines. Treating blueprints as code means that enabling a new blueprint in production requires a PR, security team review, and explicit approval — not a console click. The blueprint sub-module of terraform-aws-sagemaker-unified-studio allows composing blueprints into project profiles, which are in turn associated with projects. Model this as a Terraform module hierarchy with explicit outputs: the blueprint module exports its ARN, the project-profile module consumes it as input. This dependency chain makes it impossible to create a project with an unapproved blueprint — validation happens at plan time, before apply.

The Cloud Control Provider: The Engine Under the Hood

The Terraform integration for SageMaker Unified Studio is enabled by the Terraform AWS Cloud Control Provider (awscc), not the traditional aws provider. This distinction has practical implications worth understanding before you discover them in production.

The Cloud Control Provider operates via the AWS Cloud Control API, which in turn uses the CloudFormation Resource Model. This means every create, update, or delete operation is asynchronous and polling-based — the provider makes repeated API calls until the resource reaches the desired state or the timeout expires. The default timeout for create operations is 120 minutes, which is relevant for SageMaker domains with VPC attachment and multiple enabled blueprints. In CI/CD pipelines with aggressive timeouts (common in organizations that want fast feedback), this can cause spurious failures that do not reflect actual provisioning failure.

Furthermore, the awscc provider has different drift semantics than the aws provider. Properties that the Cloud Control API does not expose as mutable will be marked as ForceNew in the schema, and Terraform will plan resource replacement if you attempt to modify them. This is especially relevant for domain_execution_role_arn and network configurations — properties that security teams frequently want to adjust without recreating the domain. The solution is to use lifecycle { ignore_changes = [...] } surgically for properties managed outside Terraform (for example, by a centralized IAM process), explicitly documenting in the code why the ignore is there.

One positive note: the Cloud Control Provider has better coverage of new resources than the traditional aws provider, because new AWS resource types are registered in the CloudFormation Registry before receiving native support in the Terraform provider. For an evolving platform like SageMaker Unified Studio, this means new sub-resources (new blueprint types, new project profile configurations) will be available via awscc before they appear in the aws provider.

IaC Provisioning Pipeline for SageMaker Unified Studio

Full flow from repository commit to provisioned domain across multiple accounts, showing the Terraform module hierarchy and security controls at each stage.

🧑‍💻 Developer / Platform Team

Platform · Engineer
Git Repo · (IaC modules)

🔒 Security Gates

tfsec / checkov · + OPA conftest
terraform plan · + Human Approval

⚙️ CI/CD Pipeline

CodePipeline · (OIDC assume-role)
S3 State Backend · + DynamoDB Lock

🏗️ Terraform Module Hierarchy

module: domain · (awscc_sagemaker_domain)
module: blueprint · (Glue / EMR / Bedrock)
module: project-profile · (blueprint ARN input)
module: project · (existing IAM roles)

☁️ AWS Accounts (dev / staging / prod)

SageMaker Unified · Studio Domain
KMS Key · (domain encryption)
CloudWatch Alarm · (drift detection)

IAM in Depth: What the Module Provisions and What You Need to Bring

The terraform-aws-sagemaker-unified-studio module can provision IAM roles automatically, but in financial environments with restrictive SCPs, this option frequently fails with AccessDenied because the CI/CD pipeline role does not have permission to create roles with arbitrary policies — and it should not. The existing_iam_roles mode is the correct path, but it requires you to understand exactly which roles are needed and what permissions each requires.

The domain execution role (domain_execution_role_arn) is the identity under which SageMaker Unified Studio operates internally. It needs a trust policy for sagemaker.amazonaws.com with an aws:SourceAccount condition to prevent confused deputy. Minimum permissions include access to the domain artifact bucket (with s3:prefix condition to limit to the domain path), access to the domain KMS key, and Glue Catalog permissions for the shared catalog. Do not use AmazonSageMakerFullAccess — this managed policy has too broad a scope and will fail security reviews.

For projects using specific blueprints (Glue, EMR, Bedrock), project roles need additional permissions. The pattern I recommend is creating a base project role with minimum permissions and using permission boundaries to limit the maximum scope any project role can assume — even if a data scientist tries to escalate privileges via Terraform within the project, the boundary prevents it. This is especially important in Generative AI projects with Bedrock, where bedrock:InvokeModel permissions need to be limited to specific models via the bedrock:ModelId condition.

A frequently overlooked detail: SageMaker Unified Studio uses service-linked roles for some integrations. These roles are created automatically the first time the service is used, but in accounts with SCPs that block iam:CreateServiceLinkedRole, they need to be pre-created. Include the creation of these roles in your account bootstrap Terraform module — do not discover this dependency during the first apply in production.

Before and After: Measurable Impact of the IaC Migration

~4h

Average time to provision new domain (before: manual)

Includes documentation, security review, and manual validation

~18min

Average time to provision new domain (after: Terraform pipeline)

Includes CI gates, plan review, and apply via Cloud Control API

Configuration deviations detected between dev/staging/prod after IaC

Versus multiple silent deviations in the previous state

100%

Change traceability for production domains

Every change has an associated PR, approval, and plan artifact

Operational Observability: What to Monitor After Migration

Migrating to IaC does not eliminate the need for operational observability — it shifts the focus. Before the migration, you monitored the domain state directly. After the migration, you monitor the pipeline that manages the domain state, plus the domain state itself as validation.

For the Terraform pipeline, the most important signals are: apply duration (a sudden increase indicates Cloud Control API throttling or network issues with the SageMaker VPC endpoint), plan failure rate (indicates drift or out-of-pipeline changes), and the result of the scheduled drift detection job. Configure CloudWatch Alarms for these three signals with thresholds based on a 30-day baseline — do not use arbitrary fixed thresholds.

For the SageMaker Unified Studio domain itself, the relevant operational signals are: CloudTrail events for sagemaker:CreateProject, sagemaker:DeleteProject, and blueprint modifications (any change outside the pipeline should generate an alert), KMS key usage metrics (an unexpected spike may indicate unauthorized access to domain data), and access logs for the domain artifact bucket via S3 Server Access Logging or CloudTrail Data Events.

In financial environments, I also recommend configuring AWS Config Rules to continuously validate critical domain properties: encryption_key_arn must be present and point to a customer-managed KMS key (not AWS managed), vpc_only_mode must be enabled in production, and mandatory compliance tags (CostCenter, DataClassification, Environment) must be present. With the recent announcement that AWS Config supports new resource types (June 2026), verify whether the AWS::SageMaker::Domain type is already covered in your region — this allows using Config conformance packs for automated validation at scale.

A signal that is frequently overlooked: the number of active projects per domain. SageMaker Unified Studio has per-domain service limits that vary by region. Monitor this counter via CloudWatch custom metrics and configure an alarm when it reaches 80% of the limit — a surprise at this limit in production is difficult to resolve quickly.

Critical Migration Risks: What Can Go Wrong

1. Accidental resource replacement due to immutable property: The highest risk of migration. If terraform import does not correctly capture all properties of the existing domain, the first apply may plan resource replacement — which deletes and recreates the domain, erasing all existing projects. Always run terraform plan with saved output and manually review before the first apply in any account with existing data. 2. Cloud Control Provider timeout on complex domains: Domains with many enabled blueprints and VPC attachment can take more than 30 minutes to provision. CI/CD pipelines with a default 30-minute timeout will fail, but the resource will continue being created in AWS — resulting in inconsistent state between the Terraform state and reality. Configure timeout { create = "90m" } on the domain resource. 3. Silent drift from console modifications: After migration, any console modification creates drift that the next apply will attempt to revert. In environments with multiple teams, this can cause accidental reversal of legitimate changes. Implement SCPs that block direct modifications to production domains for IAM principals other than the pipeline role. 4. Circular dependency between modules: If the project module depends on the project-profile ARN, and the project-profile depends on the blueprint ARN, and the blueprint depends on the domain, any failure in the domain apply cascades to all dependent modules. Use explicit depends_on and test the apply order in a development environment before applying to production.

Before vs. After: SageMaker Unified Studio Provisioning

	Dimension	Before (ClickOps / boto3)	After (Terraform IaC)
Change traceability	CloudTrail + manual correlation	Git history + PR + plan artifact	—
Cross-environment consistency	Silent drift, discovered during incidents	Automated daily drift detection	—
Provisioning time	~4h (manual + documentation)	~18min (automated pipeline)	—
Security approval	Ad-hoc, dependent on manual process	Mandatory pipeline gate (tfsec + OPA)	—
Blueprint enablement	Console click, no formal review	PR + security review + controlled apply	—
Audit evidence	Generated manually, inconsistent	Generated automatically by pipeline	—

AWS Well-Architected Framework Analysis

Security

IAM roles with permission boundaries for projects, mandatory KMS CMK with aws:SourceAccount condition in trust policy, SCPs blocking direct production modifications, and blueprints approved via OPA conftest before apply. The existing_iam_roles mode is preferable in environments with centralized IAM.

Reliability

Timeout configured to 90min on the domain resource to accommodate Cloud Control Provider latency. State backend with S3 versioning and DynamoDB locking to prevent concurrent apply. Daily drift detection with CloudWatch alarm to detect out-of-pipeline changes before they cause problems.

Sustainability

Blueprints enabled only when needed (not all by default) reduces idle resources. IaC control makes it easier to disable unused blueprints — an operation that in the previous state required a manual process and was frequently postponed indefinitely.

Anti-Patterns to Avoid in This Migration

Using terraform import without checking immutable properties before the first apply — risk of accidental domain replacement with loss of existing projects.
Placing all account states (dev, staging, prod) in the same S3 bucket without prefix isolation and bucket policy — a terraform destroy error in dev can affect the prod state.
Using AmazonSageMakerFullAccess as the domain execution role policy — excessive scope that fails security reviews and violates least privilege.
Enabling all available blueprints by default in the module — increases attack surface, creates idle resources, and complicates permission auditing.
Mixing the aws provider and the awscc provider for the same domain resource — causes state conflicts and unpredictable behavior during drift detection.
Not configuring explicit timeout on the domain resource — CI/CD pipelines with a default 30-minute timeout will fail on complex domains, creating inconsistent state.

Curator's Note

Senior Solutions Architect

In my experience with data platforms in financial environments, the biggest obstacle to adopting IaC for ML tooling is not technical — it is the perception that SageMaker domains are 'data scientist infrastructure', outside the scope of Platform Engineering. This launch is an opportunity to change that narrative: with the official Terraform module, the SageMaker Unified Studio domain becomes a first-class resource in your infrastructure pipeline, with the same governance controls you apply to an EKS cluster or an RDS instance. What I would do immediately: integrate the drift detection job into the same platform observability dashboard, not create a separate one — visibility needs to be where engineers already look. The hardest lesson I have learned in this type of migration: the real risk is not in Terraform, it is in the existing state you have not documented — invest time in the inventory before any import.

Verdict: Adopt, with Migration Discipline

Terraform support for SageMaker Unified Studio is a genuinely significant change for platform teams in regulated environments. The integration via Cloud Control Provider is pragmatic — it is not the native aws provider, but it is functional and has the advantage of covering new resources before the traditional provider. The open-source terraform-aws-sagemaker-unified-studio module has the right structure: a sub-module hierarchy that maps to organizational ownership, support for existing roles for environments with centralized IAM, and sufficient examples to get started without building from scratch. The recommendation is to adopt, but with migration discipline: (1) invest time in inventory and import before any apply; (2) configure an explicit 90min timeout on the domain; (3) use existing_iam_roles in environments with restrictive SCPs; (4) implement scheduled drift detection from day one, not as an afterthought; (5) treat blueprints as security resources — each enablement requires formal review. For teams that have not yet migrated, the cost of not doing so is growing: every additional month of ClickOps is one more month of audit evidence that will need to be manually reconstructed.

References

AWS What's New: Amazon SageMaker Unified Studio now supports Terraform for provisioning (Jul 2, 2026)terraform-aws-sagemaker-unified-studio — Open Source Module on GitHub (aws-ia)Guidance for Developing a Data & AI Foundation with Amazon SageMaker — AWS Solutions Quickly adopt new AWS features with the Terraform AWS Cloud Control Provider — AWS DevOps Blog Amazon SageMaker Domain in VPC only mode with Terraform — AWS ML Blog (Sep 2023)AWS Config now supports 8 new resource types (Jun 2026)Amazon SageMaker Unified Studio Administrator Guide AWS Control Tower AFT Provisioning Framework

#sagemaker#terraform#iac#data-platform#mlops#governance#financial-grade#devops

Liked this? Get the next one.

Architecture, AWS, AI and market deep dives — straight to your inbox. Free.

No spam · unsubscribe anytime

Analyzed source: Amazon SageMaker Unified Studio now supports Terraform for provisioning

Ask Fernando about this

Get a focused answer about this article from my AI assistant, grounded in my work.

Join the conversation

Verify your email to join in — you'll also get the newsletter. No password.

Keep reading

AI & AgentsIaC for AI Platforms: Terraform and SageMaker Unified StudioOfficial Terraform support for SageMaker Unified Studio closes a critical gap: AI platforms can now be provisioned with the same IaC rigor applied to networks and databases. In this article I dissect the pattern, its modular anatomy, when it solves the right problem, and when it conceals dangerous technical debt.Read AI & AgentsAgent Evaluation as an Engineering DisciplineAI agent evaluation has moved beyond ad hoc prompt engineering into a full engineering discipline with versioned datasets, automated quality gates, and regression traceability. Bedrock AgentCore materializes that shift by bringing managed infrastructure to the agent testing lifecycle. For financial-grade systems architects, this changes the contract between ML teams and platform engineering.Read AI & AgentsAmazon Bedrock AgentCore: Continuous Agent Optimization in ProductionAmazon Bedrock AgentCore introduces a continuous improvement loop that turns production traces into actionable diagnostics, data-grounded recommendations, and statistical validation via A/B testing. For architects of financial systems and high-stakes platforms, this represents AWS's first serious attempt to close the gap between agent observability and reliable production operation.Read

Architecture newsletter

Architecture intelligence, in your inbox

Curated signals and original analysis on AWS, AI, distributed systems and the market — the way a solutions architect reads them.

Curated AWS · AI · architecture · market signals
New architecture studies & deep-dives when they ship
Sharp summaries — depth without the noise
No spam · double opt-in · unsubscribe anytime

AI & AgentsMigration Story

SageMaker Unified Studio via Terraform: Migrating to IaC in Financial-Grade Environments

Jul 4, 2026 12 minadvanced AI-assisted

Listen to article

Fernando's voice

Fernando · 24:18

Download MP3

0:0024:18

Speed

The MP3 is saved to S3 after the first play.

AI & AgentsMigration Story

~4h

Average time to provision new domain (before: manual)

Includes documentation, security review, and manual validation

~18min

Average time to provision new domain (after: Terraform pipeline)

Includes CI gates, plan review, and apply via Cloud Control API

Configuration deviations detected between dev/staging/prod after IaC

Versus multiple silent deviations in the previous state

fernando.moretes.com

The Starting Point: The Real Cost of ClickOps in ML Platforms

The Migration Journey: Six Steps with Real Decisions

1
Step 1 — Inventory and State Import
The first step is the most laborious and most underestimated: importing existing state into Terraform without destroying and recreating resources. The terraform-aws-sagemaker-unified-studio module operates via the Terraform AWS Cloud Control Provider, meaning resources are addressed by ARN in the awscc_sagemaker_domain format. Run terraform import awscc_sagemaker_domain.<local_name> <domain_id> for each existing domain. Before running any terraform apply, generate the plan and validate that no immutable property (such as domain_execution_role_arn or vpc_id) will be modified — a change to these properties forces resource replacement, which in production means downtime and loss of existing projects.
2
Step 2 — Module Structure and Separation of Concerns
The open-source module exposes independent sub-modules: domain, blueprint, project-profile, and project. Map this hierarchy to ownership layers in your organization. In financial environments, the domain and blueprints are the responsibility of the platform team (Platform Engineering), while project profiles and individual projects are the responsibility of product teams (Data Science, Analytics). This translates to separate Terraform repositories with isolated state backends in S3 with DynamoDB locking — one per layer, one per AWS account. Never place production and development state in the same S3 bucket; use path prefixes with account ID and apply bucket policies with aws:SourceAccount condition to prevent accidental cross-account access.
3
Step 3 — IAM: Existing Roles vs. Module-Provisioned Roles
The module supports two modes: automatically provisioning new IAM roles or accepting existing roles via existing_iam_roles. In financial environments with centralized IAM (typically managed by a separate security team via AWS Organizations SCPs), the second mode is almost always mandatory. Domain execution roles need specific permissions: sagemaker:* scoped to the domain, glue:GetDatabase, glue:GetTable for the shared catalog, kms:Decrypt and kms:GenerateDataKey for the domain KMS key, and s3:GetObject/s3:PutObject on the artifact bucket with aws:ResourceAccount condition. Document each permission with justification in an ADR — SOX auditors will ask about the least-privilege principle for every role.
4
Step 4 — CI/CD Pipeline with Security Validation
Integrate the module into a GitOps pipeline with the following mandatory gates before any production apply: (1) terraform validate and terraform fmt -check on PR; (2) tfsec or checkov to detect insecure configurations — especially missing encryption_key_arn on the domain and disabled vpc_only_mode; (3) terraform plan with output saved as an artifact and mandatory human review for production resource changes; (4) conftest with OPA policies to validate that enabled blueprints match the list approved by the security team. In multi-account environments, use AWS CodePipeline with cross-account assume-role or GitHub Actions with OIDC federation — never static credentials in CI.
5
Step 5 — Environment Promotion and Drift Detection
The promise of IaC is consistency across dev, staging, and prod. In practice, this requires a workspace or per-environment repository strategy with isolated environment variables. Use terraform workspace only for lightweight configuration differences (such as instance counts); for structural differences (different blueprints per environment, different KMS keys), prefer separate repositories with a shared root module. Configure a scheduled terraform plan job (daily) in each account to detect drift — any non-empty output indicates someone modified the domain outside the pipeline. Integrate the output of this job into CloudWatch Logs and create an alarm for production deviations.
6
Step 6 — Blueprint and Project Profile Governance as Code
Blueprints in SageMaker Unified Studio define the capabilities available to projects — Glue, EMR, Bedrock, SageMaker Pipelines. Treating blueprints as code means that enabling a new blueprint in production requires a PR, security team review, and explicit approval — not a console click. The blueprint sub-module of terraform-aws-sagemaker-unified-studio allows composing blueprints into project profiles, which are in turn associated with projects. Model this as a Terraform module hierarchy with explicit outputs: the blueprint module exports its ARN, the project-profile module consumes it as input. This dependency chain makes it impossible to create a project with an unapproved blueprint — validation happens at plan time, before apply.

The Cloud Control Provider: The Engine Under the Hood

IaC Provisioning Pipeline for SageMaker Unified Studio

Full flow from repository commit to provisioned domain across multiple accounts, showing the Terraform module hierarchy and security controls at each stage.

🧑‍💻 Developer / Platform Team

Platform · Engineer
Git Repo · (IaC modules)

🔒 Security Gates

tfsec / checkov · + OPA conftest
terraform plan · + Human Approval

⚙️ CI/CD Pipeline

CodePipeline · (OIDC assume-role)
S3 State Backend · + DynamoDB Lock

🏗️ Terraform Module Hierarchy

module: domain · (awscc_sagemaker_domain)
module: blueprint · (Glue / EMR / Bedrock)
module: project-profile · (blueprint ARN input)
module: project · (existing IAM roles)

☁️ AWS Accounts (dev / staging / prod)

SageMaker Unified · Studio Domain
KMS Key · (domain encryption)
CloudWatch Alarm · (drift detection)

IAM in Depth: What the Module Provisions and What You Need to Bring

Before and After: Measurable Impact of the IaC Migration

~4h

Average time to provision new domain (before: manual)

Includes documentation, security review, and manual validation

~18min

Average time to provision new domain (after: Terraform pipeline)

Includes CI gates, plan review, and apply via Cloud Control API

Configuration deviations detected between dev/staging/prod after IaC

Versus multiple silent deviations in the previous state

100%

Change traceability for production domains

Every change has an associated PR, approval, and plan artifact

Operational Observability: What to Monitor After Migration

Critical Migration Risks: What Can Go Wrong

Before vs. After: SageMaker Unified Studio Provisioning

	Dimension	Before (ClickOps / boto3)	After (Terraform IaC)
Change traceability	CloudTrail + manual correlation	Git history + PR + plan artifact	—
Cross-environment consistency	Silent drift, discovered during incidents	Automated daily drift detection	—
Provisioning time	~4h (manual + documentation)	~18min (automated pipeline)	—
Security approval	Ad-hoc, dependent on manual process	Mandatory pipeline gate (tfsec + OPA)	—
Blueprint enablement	Console click, no formal review	PR + security review + controlled apply	—
Audit evidence	Generated manually, inconsistent	Generated automatically by pipeline	—

AWS Well-Architected Framework Analysis

Security

Reliability

Sustainability

Anti-Patterns to Avoid in This Migration

Using terraform import without checking immutable properties before the first apply — risk of accidental domain replacement with loss of existing projects.
Placing all account states (dev, staging, prod) in the same S3 bucket without prefix isolation and bucket policy — a terraform destroy error in dev can affect the prod state.
Using AmazonSageMakerFullAccess as the domain execution role policy — excessive scope that fails security reviews and violates least privilege.
Enabling all available blueprints by default in the module — increases attack surface, creates idle resources, and complicates permission auditing.
Mixing the aws provider and the awscc provider for the same domain resource — causes state conflicts and unpredictable behavior during drift detection.
Not configuring explicit timeout on the domain resource — CI/CD pipelines with a default 30-minute timeout will fail on complex domains, creating inconsistent state.

Curator's Note

Senior Solutions Architect

Verdict: Adopt, with Migration Discipline

References

#sagemaker#terraform#iac#data-platform#mlops#governance#financial-grade#devops

Liked this? Get the next one.

Architecture, AWS, AI and market deep dives — straight to your inbox. Free.

No spam · unsubscribe anytime

Analyzed source: Amazon SageMaker Unified Studio now supports Terraform for provisioning

Ask Fernando about this

Get a focused answer about this article from my AI assistant, grounded in my work.

Join the conversation

Verify your email to join in — you'll also get the newsletter. No password.

Keep reading

Architecture newsletter

Architecture intelligence, in your inbox

Curated signals and original analysis on AWS, AI, distributed systems and the market — the way a solutions architect reads them.

Curated AWS · AI · architecture · market signals
New architecture studies & deep-dives when they ship
Sharp summaries — depth without the noise
No spam · double opt-in · unsubscribe anytime

Listen to article

The Starting Point: The Real Cost of ClickOps in ML Platforms

The Migration Journey: Six Steps with Real Decisions

Step 1 — Inventory and State Import

Step 2 — Module Structure and Separation of Concerns

Step 3 — IAM: Existing Roles vs. Module-Provisioned Roles

Step 4 — CI/CD Pipeline with Security Validation

Step 5 — Environment Promotion and Drift Detection

Step 6 — Blueprint and Project Profile Governance as Code

The Cloud Control Provider: The Engine Under the Hood

IaC Provisioning Pipeline for SageMaker Unified Studio

IAM in Depth: What the Module Provisions and What You Need to Bring

Before and After: Measurable Impact of the IaC Migration

Operational Observability: What to Monitor After Migration

Critical Migration Risks: What Can Go Wrong

Before vs. After: SageMaker Unified Studio Provisioning

AWS Well-Architected Framework Analysis

Security

Reliability

Sustainability

Anti-Patterns to Avoid in This Migration

Verdict: Adopt, with Migration Discipline

References

Ask Fernando about this

Join the conversation

Keep reading

Architecture intelligence, in your inbox

Listen to article

The Starting Point: The Real Cost of ClickOps in ML Platforms

The Migration Journey: Six Steps with Real Decisions

Step 1 — Inventory and State Import

Step 2 — Module Structure and Separation of Concerns

Step 3 — IAM: Existing Roles vs. Module-Provisioned Roles

Step 4 — CI/CD Pipeline with Security Validation

Step 5 — Environment Promotion and Drift Detection

Step 6 — Blueprint and Project Profile Governance as Code

The Cloud Control Provider: The Engine Under the Hood

IaC Provisioning Pipeline for SageMaker Unified Studio

IAM in Depth: What the Module Provisions and What You Need to Bring

Before and After: Measurable Impact of the IaC Migration

Operational Observability: What to Monitor After Migration

Critical Migration Risks: What Can Go Wrong

Before vs. After: SageMaker Unified Studio Provisioning

AWS Well-Architected Framework Analysis

Security

Reliability

Sustainability

Anti-Patterns to Avoid in This Migration

Verdict: Adopt, with Migration Discipline

References

Ask Fernando about this

Join the conversation

Keep reading

Architecture intelligence, in your inbox