About this article
This article is the sixth deep dive in the “Software Architecture” category of the Architecture Crash Course for the Generative-AI Era series, covering transaction design.
The mechanism that guarantees “all-succeed-or-all-undo” — the design core that prevents money-disappearing or money-appearing accidents like bank transfers. The article covers ACID properties, isolation levels, distributed transactions, eventual consistency, Saga / Outbox patterns, the CAP theorem, and idempotency, with axes for sorting consistency level per business need.
More articles in this category
What keeps “money from disappearing”
Transaction design decides “which data needs what granularity of consistency” in the system. Some data — bank-account balances — needs strict consistency; some — SNS “like counts” — can drift slightly without harm. Applying maximum consistency to everything makes the system too heavy; sorting precision per business need is the design core.
It must be considered together with overall structure (especially microservices). If 1 DB is enough, ACID alone suffices; whether you can lean on a configuration that avoids distributed transactions ties directly to operational cost.
Consider it together with overall structure. If 1 DB is enough, ACID alone is enough.
ACID properties
The classic transaction guarantee RDBMSes have offered for decades is ACID — four properties whose initials guarantee data integrity strongly.
ACID matters because applications’ biggest losses come from “money / inventory / order data inconsistency.” Once data drifts, root-cause investigation, correction, and customer-facing costs balloon, shaking trust in the entire business.
ACID established with RDB development from the 1970s, designed assuming “strong integrity guaranteed within one DB.” It worked perfectly when servers and DBs were one machine each. In the cloud-era distributed environment, maintaining ACID across multiple DBs and services requires locking all nodes; one network outage stops everything, making it operationally impractical. That’s why eventual consistency and Saga (covered later) emerged.
| Property | Substance |
|---|---|
| Atomicity | All-or-nothing; no partial state |
| Consistency | Always fail on constraint violations |
| Isolation | Concurrent execution doesn’t interfere |
| Durability | Committed data survives failures |
Within a single DB, ACID is solidly sufficient — the default for finance, accounting, inventory, and other strong-consistency-required businesses.
Isolation levels
In ACID, the realistic trade-off with performance is “I” (Isolation). The level you pick determines how strictly other transactions are kept apart. Stricter raises consistency but lowers concurrent performance.
| Level | Allowed phenomena | Performance |
|---|---|---|
| READ UNCOMMITTED | See others’ uncommitted values | Fastest |
| READ COMMITTED | Same query may produce different results (non-repeatable read) | Fast |
| REPEATABLE READ | New rows may appear mid-transaction (phantom read) | Mid |
| SERIALIZABLE | All prevented (strictest) | Slow |
PostgreSQL defaults to READ COMMITTED, MySQL (InnoDB) to REPEATABLE READ. Higher isolation costs performance, so strengthen only the parts that need it.
The difficulty of distributed transactions
In microservices and multi-DB configurations, transactions across multiple DBs sometimes become necessary. Traditionally 2PC (Two-Phase Commit — ask all DBs “prepared?” then commit together) solved this, but it has fatal flaws in the cloud era.
2PC forces a two-phase exchange of “all nodes prepared -> all nodes commit”, so any one node going down blocks everything. In cloud environments where network failures are routine, running 2PC in production frequently stops the whole service — “effectively non-functional” in practice.
The cloud-native principle is to avoid 2PC. Use eventual consistency instead.
Eventual Consistency
Eventual consistency allows a relaxed guarantee: “not consistent immediately, but consistent eventually.” Writes succeed instantly; propagation to other replicas and services happens asynchronously. So users may temporarily see stale data on screen.
Bank accounts can’t drift even momentarily, but “like counts,” “follower counts,” “review counts” on e-commerce sites work fine even if 1 second stale. Treating “data where slight drift is fine” as eventually consistent secures the system’s overall availability and scalability.
Examples: Amazon inventory counts, Twitter follower counts, Instagram likes, YouTube view counts.
Per data type, judge from business requirements whether strong consistency is required or eventual is sufficient.
Saga pattern
Saga is a pattern that implements distributed transactions as “a chain of small local transactions + compensation.” In a hotel-booking flow, “reservation creation,” “payment,” “inventory reservation,” “notification” run on different services; failure mid-flow triggers compensation to undo earlier steps.
sequenceDiagram
participant U as User
participant R as Reservation svc
participant P as Payment svc
participant I as Inventory svc
participant N as Notification svc
U->>R: Hotel booking
R->>P: Charge
P-->>R: Charge success
R->>I: Reserve inventory
I-->>R: Inventory NG
Note over R,I: Failure -> compensate
R->>P: Compensate: refund
P-->>R: Refund done
R-->>U: Booking failed (out of stock)
Unlike 2PC, this doesn’t lock all services; each step completes locally, so it works in cloud environments. But incomplete compensation design risks leaving half-done state.
Pseudocode for orchestration-style Saga: each step completes in its own local TX; on failure, compensation runs in reverse order.
async function bookHotel(input: BookingInput) {
const completed: Array<() => Promise<void>> = [];
try {
const reservation = await reservationSvc.create(input);
completed.push(() => reservationSvc.cancel(reservation.id));
const payment = await paymentSvc.charge(input.amount);
completed.push(() => paymentSvc.refund(payment.id));
await inventorySvc.reserve(input.roomId);
completed.push(() => inventorySvc.release(input.roomId));
await notificationSvc.notify(input.userId);
return { ok: true, reservationId: reservation.id };
} catch (err) {
// Compensate in reverse order (idempotent assumed)
for (const compensate of completed.reverse()) {
await compensate().catch(logCompensationFailure);
}
throw err;
}
}
Compensation must be idempotent. If retries running double don’t produce stable results, the saga itself becomes a new source of consistency incidents. In production, combine with the Outbox pattern to reliably synchronize message sending and DB updates — the standard approach.
Two Saga forms
Saga has two implementation styles, differing in where flow control sits.
| Style | Trait | Pros | Cons |
|---|---|---|---|
| Orchestration | Central orchestrator controls order | Visible flow, easy debugging | Central-aggregation risk |
| Choreography | Event-driven, services act autonomously | Loose coupling, scales | Hard to grasp the whole |
When business is complex and order matters, Orchestration. When simple and services are independent, Choreography.
Default is starting from Orchestration. Systems where the whole isn’t visible become operational hell.
Outbox pattern
Outbox harmonizes “DB write” and “message-queue send” in a single DB transaction. It prevents the recurring microservices trouble “DB succeeded but Kafka send failed” (Kafka: distributed message-queue infrastructure) — the standard pattern.
- Record the message-to-send into the
outboxtable together with business data. - A separate process (Relay) reads
outboxand sends to the message queue. - Update the sent flag.
This prevents the accident of only one of “DB commit” or “send” succeeding. For event-driven microservice architecture, near-mandatory.
Saga + Outbox is the default; near-mandatory for microservices.
CAP theorem
The foundation of distributed-system architecture is the CAP theorem: Consistency, Availability, and Partition tolerance can’t be simultaneously satisfied — that’s the constraint.
Network partitions happen in reality, so “P” is essentially required. Practical choice becomes “CP (consistency-priority)” vs “AP (availability-priority).” Business requirements determine whether “err out but be accurate” or “keep running with stale data.”
| Choice | Representative systems | Business |
|---|---|---|
| CP (consistency) | Traditional RDB, MongoDB (configurable) | Banking, payments, inventory |
| AP (availability) | DynamoDB, Cassandra, Redis | SNS, e-commerce browsing, analytics |
Reverse from business requirements. Picking distributed DBs without CAP awareness produces unexpected behavior.
Idempotency
In distributed systems, “the same request arriving multiple times” due to network failures and timeouts is routine. The property guaranteeing the result is the same regardless of repeat execution is idempotency.
Without idempotent “payment APIs,” retries cause double charges. Idempotency is achieved by clients attaching a request ID and servers deduplicating by that ID — the default. By HTTP method, GET / PUT / DELETE are spec-defined idempotent; POST is not, requiring care.
In microservices, event-driven, and retry processing, idempotency is required; bolting it on later is hard, so design from the start.
Decision criteria: choosing consistency level
The core of transaction design is identifying the consistency level needed per business. Demanding strong consistency for all data crashes performance; making everything eventual breaks the business.
The decision axis: “if this data is even briefly inconsistent, will there be financial, legal, or reputational damage?” Money-moving processing, inventory allocation, records subject to taxation or audit can’t tolerate even one cent / one unit drift, so strong consistency is mandatory; compromising here shakes the business itself.
Conversely, “data that works business-wise even with slight drift” like like counts, view counts, recommendations, log aggregations is fine with eventual consistency; applying strong consistency would sacrifice performance and scalability. Asking the business “how would it hurt if this data were a few seconds stale?” and reasoning back is most reliable.
| Business | Required consistency | Reason |
|---|---|---|
| Bank transfers / payments | Strong (ACID) | Drift = financial damage |
| Inventory | Strong or Saga | Overselling = refunds and trust loss |
| Order history | Strong | Tax / audit requirements |
| Likes, view counts | Eventual | Slight drift is harmless |
| Recommendations | Eventual | Stale is OK if working |
| Logs / analytics | Eventual | Scale over real-time |
Strong consistency is a high-cost requirement; limit it to where it’s truly needed.
Data-type × consistency-level ladder
Note: industry rates as of April 2026. Periodic refresh required.
“All strong consistency” is excess and “all eventual” is dangerous; sorting consistency level per data type is the operational core. Typical industry split:
| Data type | Consistency level | Reason | Implementation |
|---|---|---|---|
| Bank balance / payments | Strong (ACID, SERIALIZABLE) | 1-cent drift = damage | 1-DB ACID |
| Orders / inventory allocation | Strong | Overselling = refunds and trust loss | 1-DB ACID or Saga |
| Member registration / auth | Strong | Direct security risk | 1-DB ACID |
| Tax / audit records | Strong + tamper protection | Legal requirement | ACID + WORM |
| Cart info | Mid (READ COMMITTED) | Temporary drift OK | 1-DB TX |
| Likes / view counts | Eventual | Seconds-stale OK | Async aggregation |
| Recommendations | Eventual | Stale is OK if working | Batch recompute |
| Logs / analytics | Eventual | No real-time need | Kafka + DWH |
The isolation-level numeric gate: PostgreSQL’s default READ COMMITTED as the basis, with “only inventory allocation and financial transactions individually elevated to SERIALIZABLE” as the rule. Applying SERIALIZABLE to all tables drops concurrent performance several-fold; limit it to the necessary parts.
Strong consistency is high-cost; limit it to genuinely necessary business.
Distributed-transaction traps
Common ways microservice-crossing / multi-DB-crossing transactions fail. All produce production data inconsistency.
| Forbidden move | Why |
|---|---|
| 2PC (Two-Phase Commit) in cloud production | Network failure blocks all services; effectively nonfunctional |
| Retries without idempotency keys | Network failures cause double payments, double inventory decrements, multiple notifications |
| Saga without compensation design | Half-done state (charged but no inventory) remains; manual repair required |
| DB write and message-queue send in separate TX | Without Outbox, “DB success / Kafka send fail” breaks consistency |
SERIALIZABLE on all data | Concurrent performance drops several-fold; limit to necessary places |
| Concurrent updates without optimistic lock | Lost-update accident; one update vanishes |
| Treating timeout = failure and retrying | Server might have succeeded; idempotency key required |
| Reading from DB replica then immediately writing | Replication lag (ms-seconds) overwrites with stale value |
| DIY distributed transactions | Building without knowing Saga / Outbox always breaks; lean on libraries (Temporal, etc.) |
Strictly speaking, the Knight Capital 2012 incident wasn’t a distributed-TX accident, but the scenario of “one machine left with old code + retry runaway” producing 45 minutes / $440M loss / company dissolution shows the horror of neglecting idempotency and consistency design (full details in Appendix: Major Incident Catalog).
“Distributed + retry + no idempotency” is the triple-landmine set for double processing.
The AI-era lens
Even with AI-driven development as the assumption, the fundamental difficulty of distributed transactions isn’t solved by AI. If anything, as AI generates massive code in short time, the need for humans to carefully verify “are transactions correctly bracketing the work?” rises.
AI tends to generate “surface-correct-looking code” while missing consistency errors crossing DB boundaries, only surfacing as data inconsistency in production.
| AI-era favorable | AI-era unfavorable |
|---|---|
| Complete in 1-DB ACID | Hand-written Saga across services |
| Standard patterns like Outbox | Custom eventual-consistency implementations |
| Trust DB-vendor features (Aurora Global, etc.) | Distributed TX written at app layer |
| Schema and constraints expressed in code | DB triggers and hidden consistency rules |
The AI-era iron rule: “lean on simple design; don’t exceed what AI can write or read.” Sticking with modular monolith + 1 DB ACID is overwhelmingly easier to operate in the AI era than building distributed TX across microservices.
In the AI era, “design that avoids distributed TX” is the smart choice; 1-DB-complete simplicity has value.
Common misreadings
- “ACID means it’s safe” -> ACID applies only within 1 DB. Across multiple services, different design (Saga + Outbox) is required.
- “SERIALIZABLE is safest” -> Too-strict isolation crushes concurrent performance. Strengthen only where needed.
- “Eventual consistency is engineer laziness” -> It’s a choice matched to business characteristics, not laziness. Applying strong consistency to like counts is the sign of immature design.
- “Retries fix everything” -> Retries without idempotency are the direct cause of double processing. Design idempotency keys before adding retries.
”The day we treated timeout = failure” (industry case)
A project added 3-retry to the client side as timeout protection for the payment API; the result was duplicate charges to a small number of users. Server-side processing had succeeded; only the response was delayed, and all retries went through.
Few developers haven’t watched a similar failure happen nearby. Retry implementation tends to come in optimistically, and the “probably failed, let’s resend” mindset is the textbook path to production double-charges.
Without an idempotency key, “timeout = failure” is almost never a safe assumption. The “don’t know if it succeeded” state across the network is routine in distributed systems. Design idempotency keys before adding retries — that’s the rule.
Distributed + retry + no idempotency = double-processing landmine. Triple-set is the safety condition.
What you must decide — what’s your project’s answer?
Articulate your project’s answer in 1-2 sentences for each:
- Per-data consistency requirement (strong / eventual)
- Isolation level (the lowest line business tolerates)
- Distributed-transaction handling (Saga / Outbox / avoid)
- Retry policy and idempotency
- Locking strategy (optimistic / pessimistic)
- CAP choice (CP / AP)
Common failure patterns
- Demanding strong consistency for all data, performance crashes -> Setting SERIALIZABLE “to be safe” crushes concurrent performance.
- Solving distributed TX with 2PC -> Effectively nonfunctional in cloud. Use Saga + Outbox.
- Incomplete compensation, half-done state remains -> In Saga, compensation design is critical. Treat it as seriously as the success path.
- Multi-execution from retry without idempotency -> Network failure causes double payments, inventory double-decrements, multiple notifications.
Recognize “consistency is a high-cost requirement.” Apply strong consistency only where genuinely needed.
How to make the final call
Transaction design’s core is reasoning back from “how much drift does the business tolerate”; never decide by technical preference. Treating bank-transfer-strict business (no 1-cent drift) and SNS like-count business (1-second stale fine) at the same intensity makes the system too heavy and breaks down.
The selection core is “sort consistency levels per data.” Strong consistency is high-cost; limit it to genuinely necessary business. Misjudging here produces either performance crash or business breakdown.
The modern default is “design avoiding distributed transactions.” 2PC is essentially nonfunctional in cloud, and microservice-crossing transactions can only be built with Saga + Outbox — at very high design cost.
In AI-driven development, “surface-correct but consistency-broken code” is easy to generate; the more distributed the TX, the higher the incident risk. If 1-DB ACID can complete things, that’s strongest; modular monolith + 1 DB is overwhelmingly easier to operate in the AI era. When distributed is unavoidable, build the Saga + Outbox + idempotency triple-set correctly.
Selection priority:
- Reverse from business requirements (limit strong consistency to genuinely needed parts).
- Prioritize 1-DB completion (distributed TX is high-cost).
- If distributed is needed, Saga + Outbox (avoid 2PC).
- Idempotency as a hard requirement (bolting on later is hard).
“Consistency is a high-cost requirement.” Concentrate it where needed; keep the rest light with eventual.
Summary
This article covered transaction design — ACID, isolation levels, Saga, Outbox, the CAP theorem, idempotency.
Sort consistency level per data, prioritize 1-DB completion, and use the Saga + Outbox + idempotency triple-set when distributed is needed. The 2026 realistic answer including AI era.
The next article is the Software Architecture category’s final installment: authentication and sessions (server session / JWT / OAuth).
Back to series TOC -> ‘Architecture Crash Course for the Generative-AI Era’: How to Read This Book
I hope you’ll read the next article as well.
📚 Series: Architecture Crash Course for the Generative-AI Era (23/89)