Who is Fernando F. Azevedo?

Fernando F. Azevedo is a Senior Solutions Architect at Banco Itaú with 16+ years of experience across AWS, event-driven architecture, DevSecOps, Data Mesh, AI and financial systems.

What technical topics does Fernando work with?

Fernando works with AWS, Kubernetes, Kafka, Data Mesh, Amazon Bedrock, RAG, DevSecOps, observability, financial systems and architecture communication using C4, ADRs and trade-off analysis.

Is Fernando available for professional conversations?

Fernando is currently building at Banco Itaú and is open to thoughtful conversations about architecture, cloud, AI, engineering leadership, community, podcasts and technical collaboration.

Security & ResilienceComparison

AI Agents for Security and DevOps: Productivity or Risk?

Jun 18, 2026 8 minexpert AI-assisted

Listen to article

generated on play

Generated only on first play

On demand

0:000:00

Speed

The MP3 is saved to S3 after the first play.

COMPARISON

Frontier agents for security and DevOps

~30s

Default timeout per tool invocation in Bedrock Agents

Insufficient for Config rule evaluations — configure Lambda with 300s and retry in agent

95%

Accuracy threshold to keep agent in autonomous mode

Below this, circuit breaker activates automatic downgrade to copilot mode via Parameter Store

Relative cost of P1 vs P4 for 100 events/day

Autonomous agent spends ~3x more on tokens and Lambda executions per complete ReAct cycle

AWS launched frontier agents for security testing and cloud operations, opening a real debate about how far AI autonomy can go in regulated environments. This article compares four deployment patterns — fully autonomous agent, semi-autonomous with human approval, assisted (copilot), and deterministic pipeline — using concrete criteria of risk, cost, latency, and compliance.

When AWS announces frontier agents for security testing and cloud operations, the right question isn't 'does it work?' — it's 'at what cost, with what controls, and in which regulatory context?' I've worked with financial-grade systems for over 16 years and have seen enough automation cycles to recognize the pattern: the technology arrives with productivity promises, and the governance architecture comes later, playing catch-up. This time the stakes are higher because we're talking about agents with access to real tools — security APIs, IAM permissions, code execution, calls to critical services. The difference between an agent that finds a misconfiguration and one that accidentally exploits it is, often, just a poorly scoped IAM policy. This article is an honest bake-off between four AI agent deployment patterns for security and DevOps, using criteria that matter in production.

What frontier agents are and why the financial context changes everything

Frontier agents, in the AWS context, are built on Amazon Bedrock Agents with access to action groups that invoke real tools: AWS Security Hub, GuardDuty, Systems Manager Run Command, Lambda, and even external APIs via HTTP. The difference from a RAG-enabled chatbot is fundamental: the agent doesn't just respond — it plans, executes, observes the result, and iterates. This is the ReAct loop (Reason + Act) operating on real infrastructure.

In a regulated financial environment — PCI-DSS, SOC 2, BACEN 4.658, LGPD — every agent action needs to be auditable, reversible where possible, and bounded by least-privilege principles. The problem is that large language models (LLMs) are inherently non-deterministic. The same instruction can generate different action sequences across runs. For an ETL pipeline this is tolerable; for an agent that has permission to modify security groups or execute scripts on EC2 instances, it's a first-order operational risk.

Bedrock Agents offers traceability via InvokeAgent with enableTrace: true, which exposes the chain-of-thought and each tool call in CloudWatch Logs. This is necessary, but not sufficient. Traceability without permission scope control is just a pretty log of the disaster.

The four patterns that actually exist in production

After working with platform and security teams across multiple contexts, I've identified four recurring patterns for how AI agents are deployed for security and DevOps. These aren't theoretical categories — they're real choices with real trade-offs.

Pattern 1 — Fully Autonomous Agent: The agent receives a high-level objective ('audit the security posture of this AWS account and remediate CIS Benchmark deviations') and executes without human intervention. Uses Bedrock Agents with action groups for Security Hub, Config, and IAM Access Analyzer. The primary risk is the blast radius of a wrong model decision — an incorrectly applied SCP rule can block production workloads.

Pattern 2 — Semi-Autonomous with Human Approval: The agent plans and proposes, but each destructive or high-impact action passes through a human approval checkpoint via Step Functions .waitForTaskToken. This is the pattern I advocate for regulated financial environments.

Pattern 3 — Assisted (Copilot): The agent only suggests; an engineer executes. All the intelligence is in runbook generation, log analysis, and alert triage. Zero execution autonomy.

Pattern 4 — Deterministic Pipeline with Punctual AI: Traditional automation (Step Functions, EventBridge, Lambda) with LLM invoked only for specific natural language tasks — classifying alert severity, generating an incident summary, or translating a CVE into business impact. The agent has no tools; it's just a text transformer within a controlled flow.

Autonomy Spectrum: Four Agent Patterns for Security and DevOps

Each pattern represents a different point on the autonomy × control axis. Edges show the decision and approval flow in each mode.

🧠 AI Layer — Bedrock

Bedrock Agent · ReAct loop
LLM Inline · InvokeModel only

🔐 Security Tools

Security Hub · findings API
GuardDuty · alerts
IAM Access · Analyzer

⚙️ Orchestration

Step Functions · waitForTaskToken
Lambda · action executor
EventBridge · trigger rules

👤 Human Control Plane

Human Approver · Slack / Console
Engineer · copilot mode

📋 Audit & Observe

CloudWatch Logs · enableTrace=true
CloudTrail · all API calls

Comparison: Four AI Agent Patterns for Security and DevOps

	Criterion	P1 — Autonomous	P2 — Semi-autonomous	P3 — Copilot	P4 — Pipeline + Punctual AI
Blast radius	High — direct action, no brake	Medium — gated by approval	Zero — human executes	Low — LLM has no tools	—
Incident response latency	< 2 min (fully automatic)	2–15 min (awaits human)	15–60 min (human executes)	< 5 min (fixed pipeline + LLM)	—
Auditability (PCI-DSS / SOC 2)	Partial — requires enableTrace + detailed CloudTrail	High — each decision has recorded approval	Full — human is the auditable actor	High — deterministic flow + LLM log	—
Estimated monthly cost (100 events/day)	US$ 800–2,000 (tokens + Lambda + SM)	US$ 600–1,500 (tokens + SFN + notification)	US$ 200–500 (tokens only)	US$ 150–400 (minimal tokens + Lambda)	—
Prompt injection risk	Critical — agent executes what payload says	High — human can be deceived	Low — human validates before acting	Minimal — LLM has no execution tools	—
Fit for regulated environments (BACEN, PCI)	Not recommended without extensive additional controls	Adequate with defined approval SLA	Fully adequate	Fully adequate	—
Required IAM scope	Broad — over-permission risk	Medium — scoped by approval phase	Read-only for the agent	Minimal — only InvokeModel + logs	—

The real problem: IAM, prompt injection, and blast radius in tool-enabled agents

The most underestimated risk in autonomous security agents isn't the model hallucinating — it's the model being induced to act maliciously by data it processes. This is indirect prompt injection: a Security Hub finding containing a crafted payload in the description field can instruct the agent to execute unintended actions. In environments where the agent has ec2:AuthorizeSecurityGroupIngress or iam:AttachRolePolicy permissions, the impact is immediate and potentially irreversible.

Mitigation starts with IAM. For Pattern 2 (semi-autonomous), the agent's execution role must use restrictive IAM conditions. For example, for Security Hub remediation, the role should have securityhub:UpdateFindings only with condition StringEquals: aws:ResourceTag/Environment: sandbox. Production actions require a second role explicitly assumed after human approval, with sts:AssumeRole recorded in CloudTrail.

For Pattern 4, the design is cleaner: the Lambda that invokes bedrock:InvokeModel has only bedrock:InvokeModel in its policy. The LLM result is treated as untrusted data — it passes through a deterministic parser that extracts only expected fields (severity, category, estimated impact) before feeding Step Functions. This completely eliminates prompt injection risk because the LLM never has access to execution tools.

A critical operational detail: Bedrock Agents has a default 30-second timeout per tool invocation. In security workflows where a tool can take 2–3 minutes (e.g., running an AWS Config rule evaluation), this causes silent failures. Configure actionGroupExecutor with high-memory Lambda (512 MB+) and adjust the function timeout to 300 seconds, with retry configured in Bedrock Agents itself (maxRetries: 2, well-defined stopSequences).

Prompt Injection in Security Agents is a Real Attack Vector

If your agent processes security findings, application logs, or support tickets as input for action decisions, you have a prompt injection attack surface. An attacker who can write to CloudWatch Logs or create a finding in Security Hub can potentially influence agent behavior. Treat all input external to the agent as untrusted — sanitize, validate schema, and never let raw LLM output directly feed an API call with write permissions.

Decision Matrix: Which Pattern to Use?

P1 — Fully Autonomous Agent

Pros

Minimal MTTD/MTTR — response in seconds
Scales without linear headcount cost
Ideal for sandbox and controlled red team environments

Cons

High blast radius without explicit circuit breakers
Not auditable for PCI-DSS without significant additional engineering
Critical prompt injection risk with write tools
Inevitably broad IAM scope

Only in non-production environments with read-only IAM or isolated sandbox

P2 — Semi-Autonomous with Human Approval

Pros

Real balance between speed and control
Each destructive action has human approval record
Step Functions waitForTaskToken is natively auditable
IAM can be scoped per workflow phase

Cons

2–15 min latency depends on human availability
Approval fatigue risk — humans approve without reviewing
Higher token cost per complete workflow

Recommended pattern for production in regulated environments

P3 — Assisted (Copilot)

Pros

Practically zero operational risk from the agent
Full auditability — human is the actor
Minimal token cost
Excellent for runbook generation and CVE analysis

Cons

Does not solve the alert scale problem
MTTR depends entirely on engineer availability
Underutilizes the agent's reasoning capability

Safe entry point for teams beginning with agents

P4 — Deterministic Pipeline + Punctual AI

Pros

Deterministic and end-to-end testable behavior
LLM without tools = no AI blast radius
Lowest cost — tokens only for natural language tasks
Easiest to audit and certify for compliance

Cons

Does not leverage agent planning capability
Business logic stays in code, not in the model
Less flexible for unanticipated scenarios

Best for well-defined use cases where compliance is non-negotiable

Agent observability: what to monitor beyond CloudWatch

A security agent without adequate observability is an opaque privileged actor in your AWS account. enableTrace: true in Bedrock Agents generates trace events in CloudWatch Logs with the structure modelInvocationInput, modelInvocationOutput, rationale, invocationInput (for each tool), and observation. This is the minimum — not sufficient.

For financial environments, I implement three additional layers:

1. Custom agent behavior metrics: A Lambda wrapper that instruments each agent invocation and publishes metrics to CloudWatch Metrics with dimensions AgentId, ActionGroup, ToolName, and DecisionOutcome. This enables alarms on ToolInvocationRate (abnormal spike in tool calls) and RemediationActionCount (number of remediation actions per hour).

2. CloudTrail correlation via Athena: Each Bedrock Agent sessionId is propagated as a tag in subsequent API calls via Lambda context. This allows, via Athena over CloudTrail S3, reconstructing exactly which API calls were made as a consequence of a specific agent decision — essential for forensic investigation.

3. Agent trust SLOs: I define an AgentDecisionAccuracy SLO based on sampling: a subset of agent decisions is reviewed by a human and classified as correct/incorrect. If the correct decision rate falls below 95% over a 7-day window, the agent is automatically downgraded to Pattern 3 (copilot) via feature flag in Parameter Store. This is the trust circuit breaker that most implementations ignore.

Agent governance at scale: what compliance frameworks don't yet cover

PCI-DSS v4.0, SOC 2 Type II, and BACEN 4.658 were written for deterministic systems. None of them have explicit controls for non-deterministic AI agents with execution capability. This creates a real governance gap that needs to be addressed by design, not waited on for auditors to resolve.

The three governance problems I consistently encounter:

Segregation of duties (SoD): An agent that can both detect and remediate violates the SoD principle. The solution is architectural: the detection agent has a separate IAM role from the remediation agent, and the Step Functions approval workflow is the auditable crossing point.

Change management for agent actions: Every automatic remediation is technically a configuration change. In environments with ITSM (ServiceNow, Jira Service Management), the agent should create a change record before executing any action. This can be done via an action group that calls the ITSM API — the agent doesn't execute without a valid change ID.

Versioning and rollback of agent decisions: Unlike code, you can't simply roll back a language model to a previous version. What you can do is version the agentAliasId — each alias points to a specific agent version with a fixed set of action groups and system instructions. Maintain at least two versions in production and implement a fallback mechanism via Lambda that detects performance degradation and redirects to the previous version.

The reality is that compliance teams will ask for evidence that the agent cannot act outside the defined scope. The only convincing answer is to show the execution role's IAM policy, the Step Functions state machine with approval checkpoints, and CloudTrail showing that no action was executed without the corresponding approval token.

The Safest Agent is Not the Most Capable — It's the Best Scoped

The temptation is to give the agent all available tools to maximize its utility. In practice, each tool added to the action group increases the agent's action space and, consequently, the risk of unintended action. Start with the minimum set of tools needed for the specific use case, measure the value delivered, and add tools incrementally with risk review at each addition. An agent with 3 well-defined tools is more reliable and auditable than an agent with 15 tools that 'can do everything'.

Numbers That Matter in Pattern Selection

~30s

Default timeout per tool invocation in Bedrock Agents

Insufficient for Config rule evaluations — configure Lambda with 300s and retry in agent

95%

Accuracy threshold to keep agent in autonomous mode

Below this, circuit breaker activates automatic downgrade to copilot mode via Parameter Store

Relative cost of P1 vs P4 for 100 events/day

Autonomous agent spends ~3x more on tokens and Lambda executions per complete ReAct cycle

Well-Architected Lenses for AI Agents in Security

Security

IAM least-privilege per workflow phase; KMS CMK to encrypt agent traces in CloudWatch; VPC endpoints for Bedrock in network-restricted environments; SCPs blocking out-of-scope actions even if the role permits them.

Reliability

Trust circuit breaker via accuracy SLO; fallback to Pattern 3 on degradation; idempotency in all remediation actions (check state before acting); DLQ for agent events that failed after retries.

Anti-Patterns I See Repeatedly

Giving the agent a role with AdministratorAccess 'temporarily' — this is never temporary
Treating LLM output as trusted data and passing it directly to write API calls
Not having a kill switch mechanism — an SSM Parameter or feature flag that immediately disables the agent
Measuring success only by 'number of findings remediated' without measuring remediation false positive rate
Using the same agent for detection and remediation without IAM role segregation — violates SoD
Not versioning the agent's system instructions (system prompt) — behavior changes become impossible to track

My Curation Note

Senior Solutions Architect

In practice, I would start any security agent project with Pattern 4 — deterministic pipeline with punctual LLM — and measure the delivered value for 60 days before considering migrating to Pattern 2. The most expensive lesson I've learned in financial systems is that the pressure to 'automate everything' frequently ignores the cost of a single incident caused by incorrect automation, which can outweigh months of productivity gain. Pattern 2 with Step Functions waitForTaskToken is elegant and auditable, but requires the operations team to have a defined response SLA for approvals — without this, you have an agent that stalls waiting for a human who is asleep. My concrete advice: implement the kill switch on day 1, measure AgentDecisionAccuracy from the start, and only expand the agent's tool scope when you have production data that justifies the trust.

Verdict: Autonomy is Earned, Not Granted

For regulated financial environments, Pattern 2 (semi-autonomous with human approval via Step Functions waitForTaskToken) is the correct architectural choice for AI agent security and DevOps operations. It delivers the real balance between response speed and auditable control, with phase-scoped IAM and native traceability. Pattern 4 is the right choice for well-defined use cases where compliance is non-negotiable and the team is still building confidence in the technology. Pattern 1 (fully autonomous) is only acceptable in completely isolated sandbox environments, with read-only IAM, as part of red team exercises — never in production without extensive additional controls that essentially transform it into Pattern 2. The central message is this: the autonomy of an AI agent in critical systems must be proportional to accumulated evidence of reliability, not enthusiasm for the technology. Start restricted, measure, expand with data.

References

Amazon Bedrock Agents — Developer Guide AWS Step Functions — Wait for a Callback with the Task Token AWS Security Hub — Automated Response and Remediation IAM Best Practices — Least Privilege OWASP Top 10 for LLM Applications — LLM01: Prompt Injection AWS Well-Architected Framework — Security Pillar Bedrock Agents — Action Groups with Lambda AWS re:Inforce 2024 — Generative AI Security Scoping Matrix

#bedrock-agents#security-automation#devops#governance#zero-trust#incident-response#aws-well-architected#agentic-ai

Analyzed source: AWS launches frontier agents for security testing and cloud operations

Architecture newsletter

Architecture intelligence, in your inbox

Curated signals and original analysis on AWS, AI, distributed systems and the market — the way a solutions architect reads them.

Curated AWS · AI · architecture · market signals
New architecture studies & deep-dives when they ship
Sharp summaries — depth without the noise
No spam · double opt-in · unsubscribe anytime

Security & ResilienceComparison

AI Agents for Security and DevOps: Productivity or Risk?

Jun 18, 2026 8 minexpert AI-assisted

Listen to article

generated on play

Generated only on first play

On demand

0:000:00

Speed

The MP3 is saved to S3 after the first play.

COMPARISON

Frontier agents for security and DevOps

~30s

Default timeout per tool invocation in Bedrock Agents

Insufficient for Config rule evaluations — configure Lambda with 300s and retry in agent

95%

Accuracy threshold to keep agent in autonomous mode

Below this, circuit breaker activates automatic downgrade to copilot mode via Parameter Store

Relative cost of P1 vs P4 for 100 events/day

Autonomous agent spends ~3x more on tokens and Lambda executions per complete ReAct cycle

What frontier agents are and why the financial context changes everything

The four patterns that actually exist in production

Pattern 3 — Assisted (Copilot): The agent only suggests; an engineer executes. All the intelligence is in runbook generation, log analysis, and alert triage. Zero execution autonomy.

Autonomy Spectrum: Four Agent Patterns for Security and DevOps

Each pattern represents a different point on the autonomy × control axis. Edges show the decision and approval flow in each mode.

🧠 AI Layer — Bedrock

Bedrock Agent · ReAct loop
LLM Inline · InvokeModel only

🔐 Security Tools

Security Hub · findings API
GuardDuty · alerts
IAM Access · Analyzer

⚙️ Orchestration

Step Functions · waitForTaskToken
Lambda · action executor
EventBridge · trigger rules

👤 Human Control Plane

Human Approver · Slack / Console
Engineer · copilot mode

📋 Audit & Observe

CloudWatch Logs · enableTrace=true
CloudTrail · all API calls

Comparison: Four AI Agent Patterns for Security and DevOps

	Criterion	P1 — Autonomous	P2 — Semi-autonomous	P3 — Copilot	P4 — Pipeline + Punctual AI
Blast radius	High — direct action, no brake	Medium — gated by approval	Zero — human executes	Low — LLM has no tools	—
Incident response latency	< 2 min (fully automatic)	2–15 min (awaits human)	15–60 min (human executes)	< 5 min (fixed pipeline + LLM)	—
Auditability (PCI-DSS / SOC 2)	Partial — requires enableTrace + detailed CloudTrail	High — each decision has recorded approval	Full — human is the auditable actor	High — deterministic flow + LLM log	—
Estimated monthly cost (100 events/day)	US$ 800–2,000 (tokens + Lambda + SM)	US$ 600–1,500 (tokens + SFN + notification)	US$ 200–500 (tokens only)	US$ 150–400 (minimal tokens + Lambda)	—
Prompt injection risk	Critical — agent executes what payload says	High — human can be deceived	Low — human validates before acting	Minimal — LLM has no execution tools	—
Fit for regulated environments (BACEN, PCI)	Not recommended without extensive additional controls	Adequate with defined approval SLA	Fully adequate	Fully adequate	—
Required IAM scope	Broad — over-permission risk	Medium — scoped by approval phase	Read-only for the agent	Minimal — only InvokeModel + logs	—

The real problem: IAM, prompt injection, and blast radius in tool-enabled agents

Prompt Injection in Security Agents is a Real Attack Vector

Decision Matrix: Which Pattern to Use?

P1 — Fully Autonomous Agent

Pros

Minimal MTTD/MTTR — response in seconds
Scales without linear headcount cost
Ideal for sandbox and controlled red team environments

Cons

High blast radius without explicit circuit breakers
Not auditable for PCI-DSS without significant additional engineering
Critical prompt injection risk with write tools
Inevitably broad IAM scope

Only in non-production environments with read-only IAM or isolated sandbox

P2 — Semi-Autonomous with Human Approval

Pros

Real balance between speed and control
Each destructive action has human approval record
Step Functions waitForTaskToken is natively auditable
IAM can be scoped per workflow phase

Cons

2–15 min latency depends on human availability
Approval fatigue risk — humans approve without reviewing
Higher token cost per complete workflow

Recommended pattern for production in regulated environments

P3 — Assisted (Copilot)

Pros

Practically zero operational risk from the agent
Full auditability — human is the actor
Minimal token cost
Excellent for runbook generation and CVE analysis

Cons

Does not solve the alert scale problem
MTTR depends entirely on engineer availability
Underutilizes the agent's reasoning capability

Safe entry point for teams beginning with agents

P4 — Deterministic Pipeline + Punctual AI

Pros

Deterministic and end-to-end testable behavior
LLM without tools = no AI blast radius
Lowest cost — tokens only for natural language tasks
Easiest to audit and certify for compliance

Cons

Does not leverage agent planning capability
Business logic stays in code, not in the model
Less flexible for unanticipated scenarios

Best for well-defined use cases where compliance is non-negotiable

Agent observability: what to monitor beyond CloudWatch

For financial environments, I implement three additional layers:

Agent governance at scale: what compliance frameworks don't yet cover

The three governance problems I consistently encounter:

The Safest Agent is Not the Most Capable — It's the Best Scoped

Numbers That Matter in Pattern Selection

~30s

Default timeout per tool invocation in Bedrock Agents

Insufficient for Config rule evaluations — configure Lambda with 300s and retry in agent

95%

Accuracy threshold to keep agent in autonomous mode

Below this, circuit breaker activates automatic downgrade to copilot mode via Parameter Store

Relative cost of P1 vs P4 for 100 events/day

Autonomous agent spends ~3x more on tokens and Lambda executions per complete ReAct cycle

Well-Architected Lenses for AI Agents in Security

Security

Reliability

Trust circuit breaker via accuracy SLO; fallback to Pattern 3 on degradation; idempotency in all remediation actions (check state before acting); DLQ for agent events that failed after retries.

Anti-Patterns I See Repeatedly

Giving the agent a role with AdministratorAccess 'temporarily' — this is never temporary
Treating LLM output as trusted data and passing it directly to write API calls
Not having a kill switch mechanism — an SSM Parameter or feature flag that immediately disables the agent
Measuring success only by 'number of findings remediated' without measuring remediation false positive rate
Using the same agent for detection and remediation without IAM role segregation — violates SoD
Not versioning the agent's system instructions (system prompt) — behavior changes become impossible to track

My Curation Note

Senior Solutions Architect

Verdict: Autonomy is Earned, Not Granted

References

#bedrock-agents#security-automation#devops#governance#zero-trust#incident-response#aws-well-architected#agentic-ai

Analyzed source: AWS launches frontier agents for security testing and cloud operations

Architecture newsletter

Architecture intelligence, in your inbox

Curated signals and original analysis on AWS, AI, distributed systems and the market — the way a solutions architect reads them.

Curated AWS · AI · architecture · market signals
New architecture studies & deep-dives when they ship
Sharp summaries — depth without the noise
No spam · double opt-in · unsubscribe anytime