# SageMaker Unified Studio via Terraform: Migrating to IaC in Financial-Grade Environments

Terraform support for Amazon SageMaker Unified Studio, announced in July 2026, closes a critical gap for data platforms in regulated environments: ML domains that previously required ClickOps or brittle SDK automation can now be versioned, reviewed, and promoted like any other infrastructure resource. In this article, I analyze the migration journey from a console-driven initial state to a full IaC pipeline, with particular attention to the IAM pitfalls, blueprint governance, and operational observability that make the difference in financial-grade environments.

- URL: https://fernando.moretes.com/blog/sagemaker-unified-studio-via-terraform-migrando-para-iac-em-ambientes--amazon-sagem

- Markdown: https://fernando.moretes.com/blog/sagemaker-unified-studio-via-terraform-migrando-para-iac-em-ambientes--amazon-sagem/article.md?lang=en

- Published: 2026-07-04T09:03:51.147Z

- Category: AI & Agents

- Tags: sagemaker, terraform, iac, data-platform, mlops, governance, financial-grade, devops

- Reading time: 12 min

- Source: [Amazon SageMaker Unified Studio now supports Terraform for provisioning](https://aws.amazon.com/about-aws/whats-new/2026/07/amazon-sagemaker-unified-studio-terraform/)

---

Before this launch, provisioning a SageMaker Unified Studio domain across multiple AWS accounts was an exercise in manual discipline or brittle SDK automation. Now, with the open-source `terraform-aws-sagemaker-unified-studio` module and integration via the Terraform AWS Cloud Control Provider, the full lifecycle — domain, blueprints, project profiles, and projects — enters the same GitOps pipeline that governs the rest of your data platform. For teams in financial environments, this is not convenience: it is an audit prerequisite.

## The Starting Point: The Real Cost of ClickOps in ML Platforms

Before any migration, you need to be honest about the starting state. In financial organizations I have worked with, the most common pattern for SageMaker Studio — and, by extension, for Unified Studio in its early months — was a combination of manual console work for the main domain, Python scripts via `boto3` for user configurations, and runbook documentation that lagged reality by weeks. This state is not negligence: it is the natural result of a platform that evolved faster than IaC support.

The concrete problem is not aesthetic. In an environment regulated by SOX, PCI-DSS, or BACEN, every domain configuration — which execution roles were assigned, which blueprints are active, which project profiles exist — needs to be traceable to a change ticket, reviewed by a second pair of eyes, and retroactively auditable. A domain created via the console has none of these properties by default. The change history lives in CloudTrail, but reconstructing *why* an IAM role was added to a project profile six months ago requires correlating CloudTrail events with JIRA tickets manually — a process that fails exactly when you need it most: during a security incident or an external audit.

Furthermore, domain proliferation across multiple accounts (dev, staging, prod, scientist sandboxes) without IaC creates silent drift. The production account has a Glue blueprint enabled that staging does not. The ML project profile in prod uses a different KMS key than what is documented. These deviations only surface when they cause problems — and in financial environments, problems carry regulatory cost, not just operational cost.

## The Migration Journey: Six Steps with Real Decisions

1. **Step 1 — Inventory and State Import** — The first step is the most laborious and most underestimated: importing existing state into Terraform without destroying and recreating resources. The `terraform-aws-sagemaker-unified-studio` module operates via the Terraform AWS Cloud Control Provider, meaning resources are addressed by ARN in the `awscc_sagemaker_domain` format. Run `terraform import awscc_sagemaker_domain.<local_name> <domain_id>` for each existing domain. Before running any `terraform apply`, generate the plan and validate that no immutable property (such as `domain_execution_role_arn` or `vpc_id`) will be modified — a change to these properties forces resource replacement, which in production means downtime and loss of existing projects.

2. **Step 2 — Module Structure and Separation of Concerns** — The open-source module exposes independent sub-modules: `domain`, `blueprint`, `project-profile`, and `project`. Map this hierarchy to ownership layers in your organization. In financial environments, the domain and blueprints are the responsibility of the platform team (Platform Engineering), while project profiles and individual projects are the responsibility of product teams (Data Science, Analytics). This translates to separate Terraform repositories with isolated state backends in S3 with DynamoDB locking — one per layer, one per AWS account. Never place production and development state in the same S3 bucket; use path prefixes with account ID and apply bucket policies with `aws:SourceAccount` condition to prevent accidental cross-account access.

3. **Step 3 — IAM: Existing Roles vs. Module-Provisioned Roles** — The module supports two modes: automatically provisioning new IAM roles or accepting existing roles via `existing_iam_roles`. In financial environments with centralized IAM (typically managed by a separate security team via AWS Organizations SCPs), the second mode is almost always mandatory. Domain execution roles need specific permissions: `sagemaker:*` scoped to the domain, `glue:GetDatabase`, `glue:GetTable` for the shared catalog, `kms:Decrypt` and `kms:GenerateDataKey` for the domain KMS key, and `s3:GetObject`/`s3:PutObject` on the artifact bucket with `aws:ResourceAccount` condition. Document each permission with justification in an ADR — SOX auditors will ask about the least-privilege principle for every role.

4. **Step 4 — CI/CD Pipeline with Security Validation** — Integrate the module into a GitOps pipeline with the following mandatory gates before any production `apply`: (1) `terraform validate` and `terraform fmt -check` on PR; (2) `tfsec` or `checkov` to detect insecure configurations — especially missing `encryption_key_arn` on the domain and disabled `vpc_only_mode`; (3) `terraform plan` with output saved as an artifact and mandatory human review for production resource changes; (4) `conftest` with OPA policies to validate that enabled blueprints match the list approved by the security team. In multi-account environments, use AWS CodePipeline with cross-account assume-role or GitHub Actions with OIDC federation — never static credentials in CI.

5. **Step 5 — Environment Promotion and Drift Detection** — The promise of IaC is consistency across dev, staging, and prod. In practice, this requires a workspace or per-environment repository strategy with isolated environment variables. Use `terraform workspace` only for lightweight configuration differences (such as instance counts); for structural differences (different blueprints per environment, different KMS keys), prefer separate repositories with a shared root module. Configure a scheduled `terraform plan` job (daily) in each account to detect drift — any non-empty output indicates someone modified the domain outside the pipeline. Integrate the output of this job into CloudWatch Logs and create an alarm for production deviations.

6. **Step 6 — Blueprint and Project Profile Governance as Code** — Blueprints in SageMaker Unified Studio define the capabilities available to projects — Glue, EMR, Bedrock, SageMaker Pipelines. Treating blueprints as code means that enabling a new blueprint in production requires a PR, security team review, and explicit approval — not a console click. The blueprint sub-module of `terraform-aws-sagemaker-unified-studio` allows composing blueprints into project profiles, which are in turn associated with projects. Model this as a Terraform module hierarchy with explicit outputs: the blueprint module exports its ARN, the project-profile module consumes it as input. This dependency chain makes it impossible to create a project with an unapproved blueprint — validation happens at plan time, before apply.

## The Cloud Control Provider: The Engine Under the Hood

The Terraform integration for SageMaker Unified Studio is enabled by the Terraform AWS Cloud Control Provider (`awscc`), not the traditional `aws` provider. This distinction has practical implications worth understanding before you discover them in production.

The Cloud Control Provider operates via the AWS Cloud Control API, which in turn uses the CloudFormation Resource Model. This means every create, update, or delete operation is asynchronous and polling-based — the provider makes repeated API calls until the resource reaches the desired state or the timeout expires. The default timeout for create operations is 120 minutes, which is relevant for SageMaker domains with VPC attachment and multiple enabled blueprints. In CI/CD pipelines with aggressive timeouts (common in organizations that want fast feedback), this can cause spurious failures that do not reflect actual provisioning failure.

Furthermore, the `awscc` provider has different drift semantics than the `aws` provider. Properties that the Cloud Control API does not expose as mutable will be marked as `ForceNew` in the schema, and Terraform will plan resource replacement if you attempt to modify them. This is especially relevant for `domain_execution_role_arn` and network configurations — properties that security teams frequently want to adjust without recreating the domain. The solution is to use `lifecycle { ignore_changes = [...] }` surgically for properties managed outside Terraform (for example, by a centralized IAM process), explicitly documenting in the code why the ignore is there.

One positive note: the Cloud Control Provider has better coverage of new resources than the traditional `aws` provider, because new AWS resource types are registered in the CloudFormation Registry before receiving native support in the Terraform provider. For an evolving platform like SageMaker Unified Studio, this means new sub-resources (new blueprint types, new project profile configurations) will be available via `awscc` before they appear in the `aws` provider.

## IaC Provisioning Pipeline for SageMaker Unified Studio

Full flow from repository commit to provisioned domain across multiple accounts, showing the Terraform module hierarchy and security controls at each stage.

### 🧑‍💻 Developer / Platform Team

- Platform Engineer (user)
- Git Repo (IaC modules) (ci)

### 🔒 Security Gates

- tfsec / checkov + OPA conftest (security)
- terraform plan + Human Approval (security)

### ⚙️ CI/CD Pipeline

- CodePipeline (OIDC assume-role) (ci)
- S3 State Backend + DynamoDB Lock (storage)

### 🏗️ Terraform Module Hierarchy

- module: domain (awscc_sagemaker_domain) (compute)
- module: blueprint (Glue / EMR / Bedrock) (ai)
- module: project-profile (blueprint ARN input) (compute)
- module: project (existing IAM roles) (compute)

### ☁️ AWS Accounts (dev / staging / prod)

- SageMaker Unified Studio Domain (ai)
- KMS Key (domain encryption) (security)
- CloudWatch Alarm (drift detection) (edge)

### Flows

- dev -> git: PR + commit
- git -> tfsec: security gate
- tfsec -> plan_review: pass / fail
- plan_review -> pipeline: human approval
- pipeline -> state_s3: lock + read state
- pipeline -> mod_domain: terraform apply
- mod_domain -> mod_blueprint: enables blueprints
- mod_blueprint -> mod_profile: ARN as input
- mod_profile -> mod_project: associated profile
- mod_domain -> sus_domain: provisions via Cloud Control
- mod_domain -> kms: encryption_key_arn
- pipeline -> drift_cw: daily scheduled plan

## IAM in Depth: What the Module Provisions and What You Need to Bring

The `terraform-aws-sagemaker-unified-studio` module can provision IAM roles automatically, but in financial environments with restrictive SCPs, this option frequently fails with `AccessDenied` because the CI/CD pipeline role does not have permission to create roles with arbitrary policies — and it should not. The `existing_iam_roles` mode is the correct path, but it requires you to understand exactly which roles are needed and what permissions each requires.

The domain execution role (`domain_execution_role_arn`) is the identity under which SageMaker Unified Studio operates internally. It needs a trust policy for `sagemaker.amazonaws.com` with an `aws:SourceAccount` condition to prevent confused deputy. Minimum permissions include access to the domain artifact bucket (with `s3:prefix` condition to limit to the domain path), access to the domain KMS key, and Glue Catalog permissions for the shared catalog. Do not use `AmazonSageMakerFullAccess` — this managed policy has too broad a scope and will fail security reviews.

For projects using specific blueprints (Glue, EMR, Bedrock), project roles need additional permissions. The pattern I recommend is creating a base project role with minimum permissions and using permission boundaries to limit the maximum scope any project role can assume — even if a data scientist tries to escalate privileges via Terraform within the project, the boundary prevents it. This is especially important in Generative AI projects with Bedrock, where `bedrock:InvokeModel` permissions need to be limited to specific models via the `bedrock:ModelId` condition.

A frequently overlooked detail: SageMaker Unified Studio uses service-linked roles for some integrations. These roles are created automatically the first time the service is used, but in accounts with SCPs that block `iam:CreateServiceLinkedRole`, they need to be pre-created. Include the creation of these roles in your account bootstrap Terraform module — do not discover this dependency during the first `apply` in production.

## Before and After: Measurable Impact of the IaC Migration

- **~4h** — Average time to provision new domain (before: manual). Includes documentation, security review, and manual validation
- **~18min** — Average time to provision new domain (after: Terraform pipeline). Includes CI gates, plan review, and apply via Cloud Control API
- **0** — Configuration deviations detected between dev/staging/prod after IaC. Versus multiple silent deviations in the previous state
- **100%** — Change traceability for production domains. Every change has an associated PR, approval, and plan artifact

## Operational Observability: What to Monitor After Migration

Migrating to IaC does not eliminate the need for operational observability — it shifts the focus. Before the migration, you monitored the domain state directly. After the migration, you monitor the pipeline that manages the domain state, plus the domain state itself as validation.

For the Terraform pipeline, the most important signals are: `apply` duration (a sudden increase indicates Cloud Control API throttling or network issues with the SageMaker VPC endpoint), `plan` failure rate (indicates drift or out-of-pipeline changes), and the result of the scheduled drift detection job. Configure CloudWatch Alarms for these three signals with thresholds based on a 30-day baseline — do not use arbitrary fixed thresholds.

For the SageMaker Unified Studio domain itself, the relevant operational signals are: CloudTrail events for `sagemaker:CreateProject`, `sagemaker:DeleteProject`, and blueprint modifications (any change outside the pipeline should generate an alert), KMS key usage metrics (an unexpected spike may indicate unauthorized access to domain data), and access logs for the domain artifact bucket via S3 Server Access Logging or CloudTrail Data Events.

In financial environments, I also recommend configuring AWS Config Rules to continuously validate critical domain properties: `encryption_key_arn` must be present and point to a customer-managed KMS key (not AWS managed), `vpc_only_mode` must be enabled in production, and mandatory compliance tags (`CostCenter`, `DataClassification`, `Environment`) must be present. With the recent announcement that AWS Config supports new resource types (June 2026), verify whether the `AWS::SageMaker::Domain` type is already covered in your region — this allows using Config conformance packs for automated validation at scale.

A signal that is frequently overlooked: the number of active projects per domain. SageMaker Unified Studio has per-domain service limits that vary by region. Monitor this counter via CloudWatch custom metrics and configure an alarm when it reaches 80% of the limit — a surprise at this limit in production is difficult to resolve quickly.

> **Critical Migration Risks: What Can Go Wrong:** **1. Accidental resource replacement due to immutable property:** The highest risk of migration. If `terraform import` does not correctly capture all properties of the existing domain, the first `apply` may plan resource replacement — which deletes and recreates the domain, erasing all existing projects. Always run `terraform plan` with saved output and manually review before the first `apply` in any account with existing data.

**2. Cloud Control Provider timeout on complex domains:** Domains with many enabled blueprints and VPC attachment can take more than 30 minutes to provision. CI/CD pipelines with a default 30-minute timeout will fail, but the resource will continue being created in AWS — resulting in inconsistent state between the Terraform state and reality. Configure `timeout { create = "90m" }` on the domain resource.

**3. Silent drift from console modifications:** After migration, any console modification creates drift that the next `apply` will attempt to revert. In environments with multiple teams, this can cause accidental reversal of legitimate changes. Implement SCPs that block direct modifications to production domains for IAM principals other than the pipeline role.

**4. Circular dependency between modules:** If the project module depends on the project-profile ARN, and the project-profile depends on the blueprint ARN, and the blueprint depends on the domain, any failure in the domain apply cascades to all dependent modules. Use explicit `depends_on` and test the apply order in a development environment before applying to production.

## Before vs. After: SageMaker Unified Studio Provisioning
| Criterion | Dimension | Before (ClickOps / boto3) | After (Terraform IaC) |
| --- | --- | --- | --- |
| Change traceability | CloudTrail + manual correlation | Git history + PR + plan artifact | — |
| Cross-environment consistency | Silent drift, discovered during incidents | Automated daily drift detection | — |
| Provisioning time | ~4h (manual + documentation) | ~18min (automated pipeline) | — |
| Security approval | Ad-hoc, dependent on manual process | Mandatory pipeline gate (tfsec + OPA) | — |
| Blueprint enablement | Console click, no formal review | PR + security review + controlled apply | — |
| Audit evidence | Generated manually, inconsistent | Generated automatically by pipeline | — |

## AWS Well-Architected Framework Analysis

- **security**: IAM roles with permission boundaries for projects, mandatory KMS CMK with `aws:SourceAccount` condition in trust policy, SCPs blocking direct production modifications, and blueprints approved via OPA conftest before apply. The `existing_iam_roles` mode is preferable in environments with centralized IAM.
- **reliability**: Timeout configured to 90min on the domain resource to accommodate Cloud Control Provider latency. State backend with S3 versioning and DynamoDB locking to prevent concurrent apply. Daily drift detection with CloudWatch alarm to detect out-of-pipeline changes before they cause problems.
- **sustainability**: Blueprints enabled only when needed (not all by default) reduces idle resources. IaC control makes it easier to disable unused blueprints — an operation that in the previous state required a manual process and was frequently postponed indefinitely.

## Anti-Patterns to Avoid in This Migration

- Using `terraform import` without checking immutable properties before the first `apply` — risk of accidental domain replacement with loss of existing projects.
- Placing all account states (dev, staging, prod) in the same S3 bucket without prefix isolation and bucket policy — a `terraform destroy` error in dev can affect the prod state.
- Using `AmazonSageMakerFullAccess` as the domain execution role policy — excessive scope that fails security reviews and violates least privilege.
- Enabling all available blueprints by default in the module — increases attack surface, creates idle resources, and complicates permission auditing.
- Mixing the `aws` provider and the `awscc` provider for the same domain resource — causes state conflicts and unpredictable behavior during drift detection.
- Not configuring explicit timeout on the domain resource — CI/CD pipelines with a default 30-minute timeout will fail on complex domains, creating inconsistent state.

> **Curator's Note:** In my experience with data platforms in financial environments, the biggest obstacle to adopting IaC for ML tooling is not technical — it is the perception that SageMaker domains are 'data scientist infrastructure', outside the scope of Platform Engineering. This launch is an opportunity to change that narrative: with the official Terraform module, the SageMaker Unified Studio domain becomes a first-class resource in your infrastructure pipeline, with the same governance controls you apply to an EKS cluster or an RDS instance. What I would do immediately: integrate the drift detection job into the same platform observability dashboard, not create a separate one — visibility needs to be where engineers already look. The hardest lesson I have learned in this type of migration: the real risk is not in Terraform, it is in the existing state you have not documented — invest time in the inventory before any import.

## Verdict: Adopt, with Migration Discipline

Terraform support for SageMaker Unified Studio is a genuinely significant change for platform teams in regulated environments. The integration via Cloud Control Provider is pragmatic — it is not the native `aws` provider, but it is functional and has the advantage of covering new resources before the traditional provider. The open-source `terraform-aws-sagemaker-unified-studio` module has the right structure: a sub-module hierarchy that maps to organizational ownership, support for existing roles for environments with centralized IAM, and sufficient examples to get started without building from scratch.

The recommendation is to adopt, but with migration discipline: (1) invest time in inventory and import before any apply; (2) configure an explicit 90min timeout on the domain; (3) use `existing_iam_roles` in environments with restrictive SCPs; (4) implement scheduled drift detection from day one, not as an afterthought; (5) treat blueprints as security resources — each enablement requires formal review. For teams that have not yet migrated, the cost of not doing so is growing: every additional month of ClickOps is one more month of audit evidence that will need to be manually reconstructed.

## References

- [AWS What's New: Amazon SageMaker Unified Studio now supports Terraform for provisioning (Jul 2, 2026)](https://aws.amazon.com/about-aws/whats-new/2026/07/amazon-sagemaker-unified-studio-terraform/)
- [terraform-aws-sagemaker-unified-studio — Open Source Module on GitHub (aws-ia)](https://github.com/aws-ia/terraform-aws-sagemaker-unified-studio)
- [Guidance for Developing a Data & AI Foundation with Amazon SageMaker — AWS Solutions](https://docs.aws.amazon.com/solutions/developing-a-data-and-ai-foundation-with-amazon-sagemaker/)
- [Quickly adopt new AWS features with the Terraform AWS Cloud Control Provider — AWS DevOps Blog](https://aws.amazon.com/blogs/devops/quickly-adopt-new-aws-features-with-the-terraform-aws-cloud-control-provider/)
- [Amazon SageMaker Domain in VPC only mode with Terraform — AWS ML Blog (Sep 2023)](https://aws.amazon.com/blogs/machine-learning/amazon-sagemaker-domain-in-vpc-only-mode-to-support-sagemaker-studio-with-auto-shutdown-lifecycle-configuration-and-sagemaker-canvas-with-terraform/)
- [AWS Config now supports 8 new resource types (Jun 2026)](https://aws.amazon.com/about-aws/whats-new/2026/06/aws-config-new-resource-types)
- [Amazon SageMaker Unified Studio Administrator Guide](https://docs.aws.amazon.com/sagemaker-unified-studio/latest/adminguide/)
- [AWS Control Tower AFT Provisioning Framework](https://docs.aws.amazon.com/controltower/latest/userguide/aft-provisioning-framework.html)
