Transaction Design — ACID / Eventual Consistency / Saga / Outbox

About this article

This article is the sixth deep dive in the “Software Architecture” category of the Architecture Crash Course for the Generative-AI Era series, covering transaction design.

The mechanism that guarantees “all-succeed-or-all-undo” — the design core that prevents money-disappearing or money-appearing accidents like bank transfers. The article covers ACID properties, isolation levels, distributed transactions, eventual consistency, Saga / Outbox patterns, the CAP theorem, and idempotency, with axes for sorting consistency level per business need.

What keeps “money from disappearing”

Transaction design decides “which data needs what granularity of consistency” in the system. Some data — bank-account balances — needs strict consistency; some — SNS “like counts” — can drift slightly without harm. Applying maximum consistency to everything makes the system too heavy; sorting precision per business need is the design core.

It must be considered together with overall structure (especially microservices). If 1 DB is enough, ACID alone suffices; whether you can lean on a configuration that avoids distributed transactions ties directly to operational cost.

Consider it together with overall structure. If 1 DB is enough, ACID alone is enough.

ACID properties

The classic transaction guarantee RDBMSes have offered for decades is ACID — four properties whose initials guarantee data integrity strongly.

ACID matters because applications’ biggest losses come from “money / inventory / order data inconsistency.” Once data drifts, root-cause investigation, correction, and customer-facing costs balloon, shaking trust in the entire business.

ACID established with RDB development from the 1970s, designed assuming “strong integrity guaranteed within one DB.” It worked perfectly when servers and DBs were one machine each. In the cloud-era distributed environment, maintaining ACID across multiple DBs and services requires locking all nodes; one network outage stops everything, making it operationally impractical. That’s why eventual consistency and Saga (covered later) emerged.

Property	Substance
Atomicity	All-or-nothing; no partial state
Consistency	Always fail on constraint violations
Isolation	Concurrent execution doesn’t interfere
Durability	Committed data survives failures

Within a single DB, ACID is solidly sufficient — the default for finance, accounting, inventory, and other strong-consistency-required businesses.

Isolation levels

In ACID, the realistic trade-off with performance is “I” (Isolation). The level you pick determines how strictly other transactions are kept apart. Stricter raises consistency but lowers concurrent performance.

Level	Allowed phenomena	Performance
READ UNCOMMITTED	See others’ uncommitted values	Fastest
READ COMMITTED	Same query may produce different results (non-repeatable read)	Fast
REPEATABLE READ	New rows may appear mid-transaction (phantom read)	Mid
SERIALIZABLE	All prevented (strictest)	Slow

PostgreSQL defaults to READ COMMITTED, MySQL (InnoDB) to REPEATABLE READ. Higher isolation costs performance, so strengthen only the parts that need it.

The difficulty of distributed transactions

In microservices and multi-DB configurations, transactions across multiple DBs sometimes become necessary. Traditionally 2PC (Two-Phase Commit — ask all DBs “prepared?” then commit together) solved this, but it has fatal flaws in the cloud era.

2PC forces a two-phase exchange of “all nodes prepared -> all nodes commit”, so any one node going down blocks everything. In cloud environments where network failures are routine, running 2PC in production frequently stops the whole service — “effectively non-functional” in practice.

The cloud-native principle is to avoid 2PC. Use eventual consistency instead.

Eventual Consistency

Eventual consistency allows a relaxed guarantee: “not consistent immediately, but consistent eventually.” Writes succeed instantly; propagation to other replicas and services happens asynchronously. So users may temporarily see stale data on screen.

Bank accounts can’t drift even momentarily, but “like counts,” “follower counts,” “review counts” on e-commerce sites work fine even if 1 second stale. Treating “data where slight drift is fine” as eventually consistent secures the system’s overall availability and scalability.

Examples: Amazon inventory counts, Twitter follower counts, Instagram likes, YouTube view counts.

Per data type, judge from business requirements whether strong consistency is required or eventual is sufficient.

Saga pattern

Saga is a pattern that implements distributed transactions as “a chain of small local transactions + compensation.” In a hotel-booking flow, “reservation creation,” “payment,” “inventory reservation,” “notification” run on different services; failure mid-flow triggers compensation to undo earlier steps.

sequenceDiagram
    participant U as User
    participant R as Reservation svc
    participant P as Payment svc
    participant I as Inventory svc
    participant N as Notification svc
    U->>R: Hotel booking
    R->>P: Charge
    P-->>R: Charge success
    R->>I: Reserve inventory
    I-->>R: Inventory NG
    Note over R,I: Failure -> compensate
    R->>P: Compensate: refund
    P-->>R: Refund done
    R-->>U: Booking failed (out of stock)

Unlike 2PC, this doesn’t lock all services; each step completes locally, so it works in cloud environments. But incomplete compensation design risks leaving half-done state.

Pseudocode for orchestration-style Saga: each step completes in its own local TX; on failure, compensation runs in reverse order.

async function bookHotel(input: BookingInput) {
  const completed: Array<() => Promise<void>> = [];
  try {
    const reservation = await reservationSvc.create(input);
    completed.push(() => reservationSvc.cancel(reservation.id));

    const payment = await paymentSvc.charge(input.amount);
    completed.push(() => paymentSvc.refund(payment.id));

    await inventorySvc.reserve(input.roomId);
    completed.push(() => inventorySvc.release(input.roomId));

    await notificationSvc.notify(input.userId);
    return { ok: true, reservationId: reservation.id };
  } catch (err) {
    // Compensate in reverse order (idempotent assumed)
    for (const compensate of completed.reverse()) {
      await compensate().catch(logCompensationFailure);
    }
    throw err;
  }
}

Compensation must be idempotent. If retries running double don’t produce stable results, the saga itself becomes a new source of consistency incidents. In production, combine with the Outbox pattern to reliably synchronize message sending and DB updates — the standard approach.

Two Saga forms

Saga has two implementation styles, differing in where flow control sits.

Style	Trait	Pros	Cons
Orchestration	Central orchestrator controls order	Visible flow, easy debugging	Central-aggregation risk
Choreography	Event-driven, services act autonomously	Loose coupling, scales	Hard to grasp the whole

When business is complex and order matters, Orchestration. When simple and services are independent, Choreography.

Default is starting from Orchestration. Systems where the whole isn’t visible become operational hell.

Outbox pattern

Outbox harmonizes “DB write” and “message-queue send” in a single DB transaction. It prevents the recurring microservices trouble “DB succeeded but Kafka send failed” (Kafka: distributed message-queue infrastructure) — the standard pattern.

Record the message-to-send into the outbox table together with business data.
A separate process (Relay) reads outbox and sends to the message queue.
Update the sent flag.

This prevents the accident of only one of “DB commit” or “send” succeeding. For event-driven microservice architecture, near-mandatory.

Saga + Outbox is the default; near-mandatory for microservices.

CAP theorem

The foundation of distributed-system architecture is the CAP theorem: Consistency, Availability, and Partition tolerance can’t be simultaneously satisfied — that’s the constraint.

Network partitions happen in reality, so “P” is essentially required. Practical choice becomes “CP (consistency-priority)” vs “AP (availability-priority).” Business requirements determine whether “err out but be accurate” or “keep running with stale data.”

Choice	Representative systems	Business
CP (consistency)	Traditional RDB, MongoDB (configurable)	Banking, payments, inventory
AP (availability)	DynamoDB, Cassandra, Redis	SNS, e-commerce browsing, analytics

Reverse from business requirements. Picking distributed DBs without CAP awareness produces unexpected behavior.

Idempotency

In distributed systems, “the same request arriving multiple times” due to network failures and timeouts is routine. The property guaranteeing the result is the same regardless of repeat execution is idempotency.

Without idempotent “payment APIs,” retries cause double charges. Idempotency is achieved by clients attaching a request ID and servers deduplicating by that ID — the default. By HTTP method, GET / PUT / DELETE are spec-defined idempotent; POST is not, requiring care.

In microservices, event-driven, and retry processing, idempotency is required; bolting it on later is hard, so design from the start.

Decision criteria: choosing consistency level

The core of transaction design is identifying the consistency level needed per business. Demanding strong consistency for all data crashes performance; making everything eventual breaks the business.

The decision axis: “if this data is even briefly inconsistent, will there be financial, legal, or reputational damage?” Money-moving processing, inventory allocation, records subject to taxation or audit can’t tolerate even one cent / one unit drift, so strong consistency is mandatory; compromising here shakes the business itself.

Conversely, “data that works business-wise even with slight drift” like like counts, view counts, recommendations, log aggregations is fine with eventual consistency; applying strong consistency would sacrifice performance and scalability. Asking the business “how would it hurt if this data were a few seconds stale?” and reasoning back is most reliable.

Business	Required consistency	Reason
Bank transfers / payments	Strong (ACID)	Drift = financial damage
Inventory	Strong or Saga	Overselling = refunds and trust loss
Order history	Strong	Tax / audit requirements
Likes, view counts	Eventual	Slight drift is harmless
Recommendations	Eventual	Stale is OK if working
Logs / analytics	Eventual	Scale over real-time

Strong consistency is a high-cost requirement; limit it to where it’s truly needed.

Data-type × consistency-level ladder

Note: industry rates as of April 2026. Periodic refresh required.

“All strong consistency” is excess and “all eventual” is dangerous; sorting consistency level per data type is the operational core. Typical industry split:

Data type	Consistency level	Reason	Implementation
Bank balance / payments	Strong (ACID, SERIALIZABLE)	1-cent drift = damage	1-DB ACID
Orders / inventory allocation	Strong	Overselling = refunds and trust loss	1-DB ACID or Saga
Member registration / auth	Strong	Direct security risk	1-DB ACID
Tax / audit records	Strong + tamper protection	Legal requirement	ACID + WORM
Cart info	Mid (READ COMMITTED)	Temporary drift OK	1-DB TX
Likes / view counts	Eventual	Seconds-stale OK	Async aggregation
Recommendations	Eventual	Stale is OK if working	Batch recompute
Logs / analytics	Eventual	No real-time need	Kafka + DWH

The isolation-level numeric gate: PostgreSQL’s default READ COMMITTED as the basis, with “only inventory allocation and financial transactions individually elevated to SERIALIZABLE” as the rule. Applying SERIALIZABLE to all tables drops concurrent performance several-fold; limit it to the necessary parts.

Strong consistency is high-cost; limit it to genuinely necessary business.

Distributed-transaction traps

Common ways microservice-crossing / multi-DB-crossing transactions fail. All produce production data inconsistency.

Forbidden move	Why
2PC (Two-Phase Commit) in cloud production	Network failure blocks all services; effectively nonfunctional
Retries without idempotency keys	Network failures cause double payments, double inventory decrements, multiple notifications
Saga without compensation design	Half-done state (charged but no inventory) remains; manual repair required
DB write and message-queue send in separate TX	Without Outbox, “DB success / Kafka send fail” breaks consistency
`SERIALIZABLE` on all data	Concurrent performance drops several-fold; limit to necessary places
Concurrent updates without optimistic lock	Lost-update accident; one update vanishes
Treating timeout = failure and retrying	Server might have succeeded; idempotency key required
Reading from DB replica then immediately writing	Replication lag (ms-seconds) overwrites with stale value
DIY distributed transactions	Building without knowing Saga / Outbox always breaks; lean on libraries (Temporal, etc.)

Strictly speaking, the Knight Capital 2012 incident wasn’t a distributed-TX accident, but the scenario of “one machine left with old code + retry runaway” producing 45 minutes / $440M loss / company dissolution shows the horror of neglecting idempotency and consistency design (full details in Appendix: Major Incident Catalog).

“Distributed + retry + no idempotency” is the triple-landmine set for double processing.

The AI-era lens

Even with AI-driven development as the assumption, the fundamental difficulty of distributed transactions isn’t solved by AI. If anything, as AI generates massive code in short time, the need for humans to carefully verify “are transactions correctly bracketing the work?” rises.

AI tends to generate “surface-correct-looking code” while missing consistency errors crossing DB boundaries, only surfacing as data inconsistency in production.

AI-era favorable	AI-era unfavorable
Complete in 1-DB ACID	Hand-written Saga across services
Standard patterns like Outbox	Custom eventual-consistency implementations
Trust DB-vendor features (Aurora Global, etc.)	Distributed TX written at app layer
Schema and constraints expressed in code	DB triggers and hidden consistency rules

The AI-era iron rule: “lean on simple design; don’t exceed what AI can write or read.” Sticking with modular monolith + 1 DB ACID is overwhelmingly easier to operate in the AI era than building distributed TX across microservices.

In the AI era, “design that avoids distributed TX” is the smart choice; 1-DB-complete simplicity has value.

Common misreadings

“ACID means it’s safe” -> ACID applies only within 1 DB. Across multiple services, different design (Saga + Outbox) is required.
“SERIALIZABLE is safest” -> Too-strict isolation crushes concurrent performance. Strengthen only where needed.
“Eventual consistency is engineer laziness” -> It’s a choice matched to business characteristics, not laziness. Applying strong consistency to like counts is the sign of immature design.
“Retries fix everything” -> Retries without idempotency are the direct cause of double processing. Design idempotency keys before adding retries.

”The day we treated timeout = failure” (industry case)

A project added 3-retry to the client side as timeout protection for the payment API; the result was duplicate charges to a small number of users. Server-side processing had succeeded; only the response was delayed, and all retries went through.

Few developers haven’t watched a similar failure happen nearby. Retry implementation tends to come in optimistically, and the “probably failed, let’s resend” mindset is the textbook path to production double-charges.

Without an idempotency key, “timeout = failure” is almost never a safe assumption. The “don’t know if it succeeded” state across the network is routine in distributed systems. Design idempotency keys before adding retries — that’s the rule.

Distributed + retry + no idempotency = double-processing landmine. Triple-set is the safety condition.

What you must decide — what’s your project’s answer?

Articulate your project’s answer in 1-2 sentences for each:

Per-data consistency requirement (strong / eventual)
Isolation level (the lowest line business tolerates)
Distributed-transaction handling (Saga / Outbox / avoid)
Retry policy and idempotency
Locking strategy (optimistic / pessimistic)
CAP choice (CP / AP)

Common failure patterns

Demanding strong consistency for all data, performance crashes -> Setting SERIALIZABLE “to be safe” crushes concurrent performance.
Solving distributed TX with 2PC -> Effectively nonfunctional in cloud. Use Saga + Outbox.
Incomplete compensation, half-done state remains -> In Saga, compensation design is critical. Treat it as seriously as the success path.
Multi-execution from retry without idempotency -> Network failure causes double payments, inventory double-decrements, multiple notifications.

Recognize “consistency is a high-cost requirement.” Apply strong consistency only where genuinely needed.

How to make the final call

Transaction design’s core is reasoning back from “how much drift does the business tolerate”; never decide by technical preference. Treating bank-transfer-strict business (no 1-cent drift) and SNS like-count business (1-second stale fine) at the same intensity makes the system too heavy and breaks down.

The selection core is “sort consistency levels per data.” Strong consistency is high-cost; limit it to genuinely necessary business. Misjudging here produces either performance crash or business breakdown.

The modern default is “design avoiding distributed transactions.” 2PC is essentially nonfunctional in cloud, and microservice-crossing transactions can only be built with Saga + Outbox — at very high design cost.

In AI-driven development, “surface-correct but consistency-broken code” is easy to generate; the more distributed the TX, the higher the incident risk. If 1-DB ACID can complete things, that’s strongest; modular monolith + 1 DB is overwhelmingly easier to operate in the AI era. When distributed is unavoidable, build the Saga + Outbox + idempotency triple-set correctly.

Selection priority:

Reverse from business requirements (limit strong consistency to genuinely needed parts).
Prioritize 1-DB completion (distributed TX is high-cost).
If distributed is needed, Saga + Outbox (avoid 2PC).
Idempotency as a hard requirement (bolting on later is hard).

“Consistency is a high-cost requirement.” Concentrate it where needed; keep the rest light with eventual.

Summary

This article covered transaction design — ACID, isolation levels, Saga, Outbox, the CAP theorem, idempotency.

Sort consistency level per data, prioritize 1-DB completion, and use the Saga + Outbox + idempotency triple-set when distributed is needed. The 2026 realistic answer including AI era.

The next article is the Software Architecture category’s final installment: authentication and sessions (server session / JWT / OAuth).

Back to series TOC -> ‘Architecture Crash Course for the Generative-AI Era’: How to Read This Book

I hope you’ll read the next article as well.