The path· 7
- 1Advanced 8 minLow-Carbon Computing with Retired Devices: A Real-World Edge ArchitectureRepurposing retired smartphones as edge compute nodes is not just a sustainability bet — it is an architectural decision with serious implications for reliability, security, and operational cost. In this article, I analyze the real mechanisms, failure modes, and engineering trade-offs that separate a lab experiment from a production platform.
- 2Advanced 11 minPixels to Planning: Geospatial Data Platforms on AWSEarth AI platforms are leaving research labs and entering real operational decisions — from climate credit to infrastructure asset management. Architecting this pipeline on AWS requires precise choices about raster/vector ingestion, geospatial partitioning, inference latency, and data lineage. This article documents the decisions I would make today, the anti-patterns I have seen in the field, and the checklist you can act on tomorrow.
- 3Advanced 7 minCloudWatch to OTel: Tearing Down the Observability Bridge PatternThe CloudWatch-to-OpenTelemetry bridge pattern solves a real observability fragmentation problem in multi-platform environments, but it carries operational costs and design pitfalls that rarely surface in tutorials. In this article I tear down the anatomy of this pattern, when it makes sense, and when it creates more problems than it solves.
- 4Advanced 8 minCustom Lens for Data Platforms: Anatomy of a PatternThe AWS Well-Architected Custom Lens is often treated as a documentation artifact — but when applied to enterprise data platforms, it becomes an operational governance mechanism with real teeth. In this article, I dissect the pattern's anatomy, expose its most common adoption failures, and propose a reference design that connects lens reviews to automated remediation pipelines.
- 5Advanced 9 minLLM Observability in Production: From GPU Metrics to Response QualityDeploying an LLM to SageMaker is the easy part. The hard part is knowing, in real time, whether it is answering well, using GPU efficiently, and costing what you planned. This article details the observability stack I would build today for financial-grade LLM inference.
- 6Expert 8 minAgentic RAG with OpenSearch Serverless: Anatomy of a PatternThe agentic RAG pattern with OpenSearch Serverless promises elastic scale and semantic retrieval without infrastructure management — but hides serious latency, cost, and consistency pitfalls that financial-grade systems cannot afford to ignore. In this article, I dissect the pattern's anatomy, map when it works, when it fails, and how to configure it with production-grade rigor.
- 7Expert 9 minML Observability on EKS: Logs, Metrics and Tracing Head-to-HeadML workloads on EKS generate telemetry volumes that expose the limits of any observability pipeline not designed for that profile. In this article I compare four collection and routing approaches for logs and metrics, focusing on real cost, diagnostic latency and fitness for regulated financial environments.
Deep-dive studies
teardownTeardown: Resilient Network Graphs and the Next-Generation AI NetworkAn in-depth architectural analysis of the resilient graph-based data center networks AWS is building to support AI workloads at scale — covering topology, congestion control, energy efficiency, and the trade-offs that define the next generation of cloud infrastructure.adrADR: OpenSearch Serverless vs Dedicated Vector Database for Agentic RAGThis ADR evaluates vector search infrastructure options for a multi-tenant agentic RAG platform on AWS, comparing OpenSearch Serverless, dedicated vector databases (Pinecone, pgvector), and a self-managed hybrid search layer. The decision weighs cost, p99 latency, permission-based filtering, incremental ingestion, and native Bedrock Knowledge Bases integration.design-docDesign Doc: Frontier Model Governance on Bedrock with GPT, Claude, and NovaThis document proposes an AI Gateway architecture to orchestrate and govern multiple frontier models — OpenAI GPT-5.5/GPT-4.5, Anthropic Claude, Amazon Nova, and specialized models — within Amazon Bedrock. The design covers intelligent routing, guardrails, prompt registry, inference logging, per-tenant IAM, data residency, and fallback policy, with a focus on auditability and cost control in enterprise environments.adrADR: Aurora Sharding — App-Level vs Aurora Limitless vs CitusA high-growth OLTP workload exhausted the capacity of a single Aurora PostgreSQL writer. This ADR evaluates three sharding strategies — application-layer sharding, Aurora Limitless Database, and managed Citus/PostgreSQL — weighing operational complexity, cost, cross-shard query support, and migration risk.design-docDesign Doc: RDS Proxy for Lambda + RDS Without Melting the DatabaseLambda functions under high concurrency open hundreds of direct connections to RDS, exhausting the pool and crashing the database. This document proposes RDS Proxy as a multiplexing layer, details real pinning pitfalls, compares alternatives such as Data API and application-side poolers, and defines when Proxy is not the right answer.design-docDesign Doc: Multi-Region Active-Active Payments APIThis document proposes a multi-region active-active architecture for a critical payments API, targeting near-zero RTO/RPO, deterministic conflict resolution in data replication, and a phased rollout that minimizes operational risk. The design is grounded in real financial engineering principles and AWS patterns, with explicit trade-offs between consistency, latency, and cost.