# Uber DOMA: How Domain Layers Tamed the Microservice Explosion

With over 2,200 microservices and uncontrollable cross-cutting dependencies, Uber faced a classic organizational scale problem disguised as a technical one. The answer was DOMA — Domain-Oriented Microservice Architecture — an approach that groups services into domains, exposes well-defined interfaces, and introduces abstraction layers to recover lost cohesion. This teardown reconstructs the architecture, analyzes the decisions, and points out what I'd do differently.

- URL: https://fernando.moretes.com/studies/uber-domain-oriented-microservices

- Markdown: https://fernando.moretes.com/studies/uber-domain-oriented-microservices/study.md?lang=en

- Type: Teardown

- Company: Uber

- Domain: Microsserviços

- Date: 2020-07-23

- Tags: microservices, domain-driven-design, uber, platform-engineering, architecture, scalability, organizational-design, api-gateway

- Reading time: 7 min

---

Microservices promise autonomy and speed. In practice, without structural discipline, they deliver a chaotic dependency graph that no single engineer can fully reason about. Uber reached 2,200+ services and had to invent a new organizational layer — not for the computers, but for the teams.

## Fact Sheet

- **Company:** Uber Technologies
- **Domain:** Urban mobility, delivery, logistics
- **Service count (pre-DOMA):** ~2,200 active microservices
- **DOMA article published:** September 2020
- **Core stack:** Go, Java, Python; gRPC; Kafka; Cadence (workflow orchestration)
- **Delivery network:** Multi-region, global footprint, owned data centers + hybrid cloud
- **Core problem:** Uncontrollable cross-cutting coupling between services with no clear domain boundaries
- **Adopted solution:** DOMA — Domain-Oriented Microservice Architecture with layers, gateways, and extensions

## The Problem: When Microservices Become a Distributed Monolith

Uber grew from a Python monolith to a microservice architecture over several years, driven by the need to scale independent teams across multiple products — Rides, Eats, Freight, ATG. The natural result of this growth was a service explosion: each team created what it needed, when it needed it, without a global view of dependencies.

The most visible symptom was what the engineering team internally called *dependency hell*: a payment service directly calling a route-mapping service; a notifications service coupled to pricing logic; dependency cycles that turned simple deployments into complex coordination events. With 2,200+ services, no individual engineer could maintain a reliable mental map of the system.

The root problem wasn't technical — it was organizational. Conway's Law operating at maximum scale: the system architecture mirrored the communication structure of the teams, which was itself chaotic because it grew organically without a taxonomy of responsibilities. The solution needed to attack both sides simultaneously: technical structure and organizational structure.

Beyond coupling, there was a discovery and reuse problem. Teams created duplicate services because they didn't know the functionality already existed elsewhere. The absence of explicit contracts between services meant that any internal change could — and frequently did — cause cascading breakages in consumers that should never have had direct access to that implementation.

## Reconstructed DOMA Architecture

Layered view of domain organization at Uber. Each domain exposes a Layer Interface (gateway) and may contain sub-domains. Internal services are opaque to consumers outside the domain. Extensions allow customization without modifying the core.

### 🌐 Clients & Edge

- Mobile App iOS / Android (user)
- Partner API External Consumers (edge)

### 🚪 API Gateway Layer

- API Gateway Auth / Rate Limit / Routing (edge)

### 🏗️ Domain: Rides

- Rides Layer Interface (Domain Gateway) (frontend)
- Dispatch Service (internal) (compute)
- Driver Matching (internal) (compute)
- Rides Extension (customization hook) (compute)

### 💳 Domain: Payments

- Payments Layer Interface (Domain Gateway) (frontend)
- Billing Service (internal) (compute)
- Fraud Detection (internal) (compute)

### 🗺️ Domain: Maps & ETA

- Maps Layer Interface (Domain Gateway) (frontend)
- Routing Engine (internal) (compute)
- ETA Prediction (internal) (ai)

### 📦 Shared Platform Layer

- Observability Platform Metrics / Tracing / Logs (compute)
- Kafka Event Bus (messaging)
- Cadence Workflow Orchestration (compute)

### 🗄️ Storage Layer (per domain)

- Rides DB (Schemaless / MySQL) (data)
- Payments DB (isolated) (data)
- Maps Cache (Redis / in-memory) (storage)

### Flows

- mobile -> apigw: HTTPS
- partner -> apigw: HTTPS
- apigw -> rides_gw: gRPC
- apigw -> pay_gw: gRPC
- apigw -> maps_gw: gRPC
- rides_gw -> dispatch: internal
- rides_gw -> matching: internal
- rides_gw -> rides_ext: extension
- rides_gw -> maps_gw: via interface
- rides_gw -> pay_gw: via interface
- pay_gw -> billing
- pay_gw -> fraud
- maps_gw -> routing
- maps_gw -> eta
- dispatch -> kafka: events
- billing -> kafka: events
- cadence -> rides_gw: orchestrates
- dispatch -> rides_db
- billing -> pay_db
- routing -> maps_cache
- dispatch -> observability
- billing -> observability

## How DOMA Works: Domains, Layers, and Contracts

DOMA introduces four central concepts that, together, reconstruct order in a chaotic microservice ecosystem.

**1. Domains:** A domain is a collection of microservices representing a cohesive business area — Rides, Payments, Maps, Driver, etc. The granularity is not arbitrary: it follows team responsibility lines (Conway's Law deliberately applied, not accidental). Each domain has a clear owner, an independent roadmap, and an explicit responsibility boundary.

**2. Layers:** Domains are organized into hierarchical layers. Lower layers (platform, infrastructure) cannot depend on upper layers (product, experience). This unidirectional constraint is the key to eliminating dependency cycles. In practice, Uber defined layers as: Infrastructure → Platform → Business Domain → Product → Experience. A Maps service cannot depend on Rides logic; a Rides service can depend on Maps.

**3. Layer Interface (Domain Gateway):** Each domain exposes exactly one public interface — the Layer Interface. Services internal to the domain are completely opaque to the outside world. External consumers can only interact with the domain through this interface, which functions as an internal API Gateway. This creates a stable contract: the domain can freely refactor its internal implementation as long as it maintains the interface.

**4. Extensions:** Extensions are the customization-without-modification mechanism. Instead of a consumer forking a service or adding conditional logic to the core ("if product X, then..."), it registers an extension that is invoked at specific points in the domain's flow. This is essentially the Strategy/Plugin pattern applied at platform scale. This was critical for Uber because different products (Rides vs. Eats vs. Freight) need slightly different behaviors in the same base domains.

The combination of these four elements creates what the Uber article calls "structured modularity" — modularity with explicit composition rules, not just arbitrary decomposition. The practical result is that the dependency graph, previously spaghetti, acquires a DAG (Directed Acyclic Graph) topology with well-defined layers.

## The Organizational Dimension: Why This Is as Much a Team Problem as a Code Problem

One aspect that the Uber article treats seriously and that many superficial analyses ignore: DOMA is not just a code reorganization. It is a reorganization of responsibilities and decision-making power.

Before DOMA, any team could create dependencies on any service. This seems democratic, but in practice meant that nobody was comprehensively responsible for anything. The Payments team couldn't refactor their billing service without coordinating with 40 direct consumers who had coupling to internal implementation details.

With DOMA, the Layer Interface creates a clear ownership boundary. The team that owns the domain is the sole arbiter of which functionalities are publicly exposed. This has real political consequences: product teams that previously could "jump the queue" by accessing internal services directly now need to negotiate additions to the public interface with the owning team. This is intentional friction — the kind of friction that forces design conversations instead of workarounds.

The extension mechanism is particularly clever from an organizational standpoint. It solves the classic platform problem: how does a central service serve the divergent needs of multiple products without becoming a bottleneck or an accumulator of conditional logic? DOMA's answer is to invert control — the domain defines extension points, products register their specific behaviors. The platform team doesn't need to know all use cases; product teams don't need access to the core.

This also has implications for hiring and onboarding. A new engineer on the Payments team only needs to understand the Payments domain and its interfaces with adjacent domains — not the entire system. Cognitive complexity is bounded by the domain boundary. In a system with 2,200 services without this structure, effective onboarding took months; with DOMA, the initial learning scope is drastically reduced.

## Decision Matrix: Alternatives Considered

### DOMA (adopted approach)

**Pros**
- Preserves team autonomy within the domain
- Explicit contracts via Layer Interface reduce accidental coupling
- Extensions allow customization without modifying the core
- Compatible with incremental migration — no big bang required

**Cons**
- Governance overhead: who decides domain boundaries?
- Layer Interface can become a latency bottleneck if poorly implemented
- Poorly managed extensions recreate the coupling problem at another level

**Verdict:** Best balance for Uber's scale and organizational maturity

### Consolidation into Modular Monolith

**Pros**
- Eliminates network latency between services in the same domain
- ACID transactions within the module without distributed coordination

**Cons**
- Infeasible for Uber's team scale — merge conflicts, coupled deployments
- Loses fault isolation benefits already achieved
- Migrating 2,200 services to a monolith is unacceptable risk

**Verdict:** Rejected — going backward doesn't solve the organizational problem

### Pure Service Mesh (Istio/Envoy) without domain reorganization

**Pros**
- Improves observability and traffic control without code changes
- Applies security policies (mTLS) transparently

**Cons**
- Doesn't solve the logical coupling problem — only adds visibility to it
- Doesn't address team ownership and responsibility issues

**Verdict:** Complementary, not a substitute — necessary but insufficient alone

### Centralized API Management (ESB-style)

**Pros**
- Centralized contract and versioning control

**Cons**
- Recreates the ESB bottleneck — single point of failure and deploy bottleneck
- Goes against the team autonomy philosophy Uber had already achieved
- Doesn't scale with 2,200 services without fragmenting the ESB itself

**Verdict:** Rejected — known anti-pattern at this scale

## Well-Architected Pillars Review

- **security**: **Positive:** The Layer Interface as the sole entry point to the domain creates a natural perimeter for applying authentication, authorization, and auditing. Internal services are directly inaccessible, reducing the attack surface. **Identified gap:** The article doesn't detail the authorization model between domains — in financial systems, explicit control of which domains can call which interfaces would be required, with auditing of each cross-domain call. Uber likely addresses this via their service mesh (Envoy + mTLS + RBAC), but the integration with DOMA is not publicly documented.
- **reliability**: **Positive:** Domain isolation limits the blast radius of failures. A degradation in the Maps domain doesn't directly propagate to Payments. Extensions, if implemented with circuit breakers, allow the main flow to continue even when a customization fails. **Gap:** The Layer Interface introduces an additional hop in each cross-domain call. If not implemented with adequate caching and circuit breakers, it can become a concentrated failure point. Uber uses Cadence for long-running workflows, which mitigates transient failures, but the resilience of the domain gateway itself needs to be treated wi
- **performance**: **Explicit trade-off:** DOMA adds latency to cross-domain calls by forcing routing through the Layer Interface instead of direct point-to-point calls. For Uber, where driver matching latency is critical (hundreds of milliseconds matter for the experience), this is a real cost. The article doesn't quantify this overhead. My conservative estimate is 1-5ms per additional hop on a well-optimized infrastructure — acceptable for most flows, but potentially significant on critical paths with multiple domain hops.
- **cost**: **Positive:** Better service reuse reduces duplication — teams stop recreating functionality that already exists in other domains. **Negative:** Each domain's Layer Interface needs to be a high-availability service, meaning additional redundancy, monitoring, and maintenance. With dozens of domains, this is a non-trivial operational cost. Uber has the scale to absorb this; smaller companies would need to evaluate whether the infrastructure overhead justifies the organizational gains.
- **sustainability**: Service reuse via Layer Interface reduces the proliferation of redundant instances, which has a positive impact on computational resource consumption. Less code duplication means less maintenance surface and a lower probability of keeping zombie services unnecessarily active. There is no public data on DOMA's impact on resource efficiency at Uber.

> **What I'd Do Differently:** DOMA is one of the most honest and practical contributions I've seen published on microservice architecture at scale. It's not theory — it's a solution that emerged from real pain. But there are three points where I'd diverge or supplement.

**1. Contracts as first-class artifacts, not convention.** The Layer Interface is a powerful idea, but Uber's article doesn't detail how contracts are versioned, tested, and evolved. In financial systems where I've worked, the absence of explicit contract testing (Pact, Protolock for gRPC) means that 'the interface didn't change' is a promise based on human discipline, not automated enforcement. I'd add contract testing as a mandatory gate in the CI/CD of any change to a Layer Interface.

**2. The extension model needs a centralized registry with visibility.** Extensions solve the customization problem but create a new discovery problem: which extensions are registered in which domain? Who owns them? In a system with dozens of domains and hundreds of extensions, this becomes a second-level dependency graph. I'd implement an extension registry with explicit ownership, SLOs, and alerts for orphaned extensions.

**3. The security dimension needs to be a first-class citizen in the model.** DOMA defines organizational and technical boundaries, but doesn't explicitly define security boundaries.

## Transferable Lessons: What Any Organization Can Learn

DOMA is often dismissed as a "Uber thing" — a solution for a scale most companies will never reach. This reading is mistaken. The principles are applicable well before you have 2,200 services.

**The microservice explosion starts early.** In my experience, teams of 30-50 engineers already suffer from the symptoms DOMA solves: undocumented cross-cutting dependencies, functionality duplication, slow onboarding. You don't need to wait for 2,000 services to introduce domain boundaries.

**Start with the dependency map, not the reorganization.** Before any structural change, make visible what exists. Tools like Backstage (CNCF) or even a simple graph generated from service mesh logs reveal coupling patterns. The map is the diagnosis; DOMA is one possible treatment.

**The Layer Interface is applicable even without microservices.** In modular monoliths, the same principle applies: modules expose explicit public interfaces, internal implementations are private. The difference is that the boundary is package/module rather than network. The pattern is deployment-style agnostic.

**Extensions are the correct pattern for internal platforms.** Any team building an internal platform faces the dilemma: be too generic (useless) or too specific (not reusable). DOMA's extension model — define the variation points, let consumers inject the behavior — is the correct answer to this dilemma and applies regardless of scale.

## Verdict

DOMA is one of the most substantial architectural contributions published by a technology company in the last decade — not because it is revolutionary in theory, but because it is an honest and documented solution to a problem that most organizations face and few solve systematically. The combination of domains, layers, Layer Interface, and extensions is not just a technical reorganization: it is a governance framework that aligns code structure, team structure, and responsibility structure.

The weaknesses are real: latency overhead at domain gateways, the risk that poorly managed extensions recreate the original problem at another level, and the absence of an explicit security model integrated into the framework. But these are implementation problems, not design flaws.

What impresses me most about DOMA is the explicit acknowledgment that software architecture is inseparable from organizational design. There is no technical solution to a Conway's Law problem that doesn't also involve a change in team ownership and communication structure. Uber understood this and built a solution that attacks both sides.

## References

- [Uber Engineering — Microservice Architecture (DOMA)](https://www.uber.com/blog/microservice-architecture/)
- [Martin Fowler — Conway's Law](https://martinfowler.com/bliki/ConwaysLaw.html)
- [CNCF Backstage — Service Catalog](https://backstage.io/)
- [Uber — Cadence Workflow Orchestration](https://cadenceworkflow.io/)
- [AWS Well-Architected Framework](https://aws.amazon.com/architecture/well-architected/)
- [Sam Newman — Building Microservices (2nd ed.)](https://www.oreilly.com/library/view/building-microservices-2nd/9781492034018/)
- [Pact — Consumer-Driven Contract Testing](https://docs.pact.io/)

## Case sources

- [Uber — Microservice Architecture (DOMA)](https://www.uber.com/blog/microservice-architecture/)
