About this article
As the sixth installment of the “Data Architecture” category in the series “Architecture Crash Course for the Generative-AI Era,” this article explains streaming.
Question the “real-time” requirement and 90% of the time it lands at “5-min-delay batch is enough.” This article covers streaming-platform selection (Kafka/Kinesis/Pub-Sub/Flink/ksqlDB), Exactly-Once, window processing, and decision criteria - presenting the practical iron rule of questioning whether real-time is truly needed first.
Other articles in this category
What streaming handles
At the core sits a message queue (Kafka, Kinesis, Pub/Sub) that retains events as a persistent log and simultaneously delivers them to multiple consumers. The feature is that data never stops, it keeps flowing - a fundamentally different worldview from batch.
Choose streaming only when truly needed. Operational cost is 10x batch.
Why it’s needed
1. Sub-second delay directly drives business value
Fraud detection, inventory linking, price linking, IoT control - “a few minutes later” is too late. From event occurrence to judgment must be sub-second.
2. Batch windows are disappearing
In 24/7 global operations, time for “overnight batch” is not available. Constantly flowing data needs to be processed as it arrives.
3. Loose coupling between microservices
In microservices architectures, connecting services via events is mainstream. The streaming platform functions as the central nervous system of inter-service communication.
Batch vs streaming
| Batch processing | Streaming | |
|---|---|---|
| Processing unit | Bundled data | 1 event to a few |
| Delay | Hours to days | Milliseconds to seconds |
| Implementation | Relatively easy | Hard, heavy operation |
| Cost | Cheap | Expensive |
| Retry | Easy redo | Hard to design |
| Representative tech | Spark, dbt | Kafka, Flink |
Most business requirements are fine with batch, and scenes where streaming is truly necessary are limited. If “looks real-time-ish” is enough, substituting with 15-min microbatch often works.
Main components
A streaming platform splits into “the layer that carries events” and “the layer that processes events.” The former is the message queue (Kafka, etc.), the latter is the stream-processing engine (Flink, etc.) - roles differ, so select them separately.
flowchart LR
PROD[Producer<br/>business systems/IoT]
MQ["Message queue<br/>(carrier layer)<br/>Kafka/Kinesis/Pub-Sub"]
PROC["Processing engine<br/>(aggregate/transform/join)<br/>Flink/ksqlDB"]
SINK1[(DWH/<br/>BigQuery)]
SINK2[(Search<br/>Elasticsearch)]
SINK3[Realtime<br/>dashboard]
PROD -->|Event| MQ
MQ --> PROC
PROC --> SINK1
PROC --> SINK2
PROC --> SINK3
MQ -. direct sink .-> SINK1
classDef prod fill:#fef3c7,stroke:#d97706;
classDef mq fill:#dbeafe,stroke:#2563eb,stroke-width:2px;
classDef proc fill:#fae8ff,stroke:#a21caf;
classDef sink fill:#dcfce7,stroke:#16a34a;
class PROD prod;
class MQ mq;
class PROC proc;
class SINK1,SINK2,SINK3 sink;
| Layer | Role | Representatives |
|---|---|---|
| Message queue | Persist and deliver events | Kafka, Kinesis, Pub/Sub |
| Stream-processing engine | Aggregate, transform, join | Flink, ksqlDB, Spark Streaming |
| Schema management | Define message types | Schema Registry, Protobuf |
Apache Kafka (industry-standard message queue)
Apache Kafka is OSS originating at LinkedIn and is the de facto standard for streaming platforms. The features are high throughput handling millions of events per second, designs that “persist events as a log,” and a mechanism where multiple consumers can independently read - adopted by mega-companies worldwide like Netflix, Uber, and LINE. Confluent Platform (commercial) and Confluent Cloud (managed) are also options.
The strength is “high performance and scalability,” but at the cost of extremely heavy operational load. Managing Zookeeper (KRaft today), broker partition design, consumer-group coordination - serious use is hard without a dedicated ops team.
| Pros | Cons |
|---|---|
| Overwhelming performance, track record | Heavy operational load |
| OSS, thin vendor lock-in | High learning cost |
| Rich ecosystem (Connect, Streams, etc.) | Excessive at small scale |
| Low latency (millisecond order) | High cluster-design difficulty |
If you can self-operate Kafka, it’s the strongest; if not, consider managed (Kinesis/Pub-Sub/Confluent Cloud).
Managed queues (Kinesis / Pub/Sub / Event Hubs)
Cloud-vendor-provided Kafka alternatives. The cloud handles operations, eliminating worries about scaling, availability, and backup - the biggest charm is that even small teams can have a streaming platform. AWS uses Kinesis, GCP uses Pub/Sub, Azure uses Event Hubs as standard choices.
| Pros | Cons |
|---|---|
| Near-zero operations | Cloud lock-in |
| Easy to start small | Can be more expensive at large scale |
| Easy integration with other managed services | Fine-grained tuning is hard |
| Cloud handles failures | Kafka-specific features unavailable |
Representatives: Amazon Kinesis Data Streams, Google Pub/Sub, Azure Event Hubs, Confluent Cloud
The modern rule: managed first, migrate to Kafka if you hit throughput limits.
Apache Flink (top processing engine)
Apache Flink is OSS specialized in stateful stream processing, executing complex aggregation, join, and event-time processing at millisecond latency. Used by Uber, Alibaba, Stripe at the scale of tens of billions of events per day - the serious option that implements Exactly-Once with the highest reliability.
On the other hand, operational difficulty exceeds Kafka - checkpoint design, state-backend selection, job-restart management - learning costs are very high. Managed versions exist like AWS Kinesis Data Analytics and Aliyun Realtime Compute, and adopting via these to lower operational load is realistic.
| Pros | Cons |
|---|---|
| Low latency, high throughput | High learning cost |
| Flexible to write complex processing | High operational difficulty |
| Robust Exactly-Once | Excessive at small scale |
| Strong event-time processing | Java/Scala primary (Python also) |
ksqlDB / Kafka Streams (SQL and Java library)
Lightweight processing engines specific to Kafka. ksqlDB is a product that handles Kafka via SQL, expressing aggregation and filtering in SQL without writing serious Flink-class processing. Kafka Streams is a library, with the appeal of being embeddable in applications to write stream processing.
Both presuppose Kafka and can’t be used with non-Kafka queues (Kinesis, etc.). Effective for SQL-completable use cases or wanting to embed processing in existing Java apps. Can’t do as complex processing as Flink, but the appeal is “an order of magnitude lower learning cost.”
For “scales completable with Kafka + SQL,” ksqlDB is the shortest route. Migrate to Flink when complexity grows.
Typical composition example
A typical streaming-platform composition is below. From event source to BI/DB, the decisive difference from batch is flowing in real time.
flowchart TB
SRC([Web / mobile / IoT / business systems])
BUS["Kafka / Kinesis / Pub/Sub"]
RT["Real-time processing<br/>Flink, ksqlDB"]
DWH["DWH ingestion<br/>Fivetran / Snowpipe"]
ACT["Immediate action<br/>notify/block"]
BI["Analysis / BI"]
SRC -->|event publish| BUS
BUS --> RT --> ACT
BUS --> DWH --> BI
classDef src fill:#fef3c7,stroke:#d97706;
classDef bus fill:#dbeafe,stroke:#2563eb,stroke-width:2px;
classDef rt fill:#fae8ff,stroke:#a21caf;
classDef bi fill:#dcfce7,stroke:#16a34a;
class SRC src;
class BUS bus;
class RT,ACT rt;
class DWH,BI bi;
The general split is left-side real-time processing and right-side analytics-bound ingestion - a two-line split.
The difficulty of Exactly-Once
The most troublesome thing in streaming is realizing the “guarantee of processing a message exactly once” (Exactly-Once). Network failures, restarts, and timeouts easily cause double processing or loss. In businesses like bank transfers, payments, or inventory updates, duplication is critical.
Kafka and Flink support Exactly-Once, but “end-to-end guarantees require design,” and unless the consumer side is also designed idempotent (same input gives same result), it’s meaningless.
| Guarantee level | Meaning | Difficulty |
|---|---|---|
| At-Most-Once | Give up on failure (loss possible) | Easy |
| At-Least-Once | Reliably delivered, with possible duplicates | Mid |
| Exactly-Once | Strictly once | Hard |
To avoid double processing, the royal road is to design the consumer side idempotent. Exactly-Once is the shield, idempotency is the spear.
Window processing
Streaming sees frequent time-bucketed aggregation (window processing) like “sales in the last 5 minutes” or “errors per hour.” What’s an easy aggregation in batch becomes a design issue in an unending stream of “where to cut.”
| Window type | Content | Example |
|---|---|---|
| Tumbling | Fixed-length, no overlap | 0-5 min, 5-10 min |
| Sliding | Fixed-length, slid forward | Last 5 min (updated every 1 min) |
| Session | Until activity breaks | One user’s visit session |
| Global | All time | Cumulative count |
Additionally, distinguishing event time (occurrence time) from processing time (arrival time) matters - network delays disorder things, and “how to handle late-arriving events” becomes a design point.
Decision criteria
1. Is real-time truly needed
When considering streaming, the first thing to ask is whether real-time is truly needed. Hearings often reveal “a few minutes’ delay is OK,” letting batch operate cheaply and stably.
| Requirement | Recommended |
|---|---|
| Days of delay OK | Daily batch |
| Hours of delay OK | Hourly batch |
| 5-15 min delay OK | Microbatch |
| Seconds to 1 min OK | Lightweight streaming |
| Sub-100ms required | Serious streaming |
Businesses where sub-100ms is truly needed are limited - “payments, fraud detection, ad bidding, IoT control, exchanges.”
2. Operational regime
Streaming “runs 24/7 continuously,” so monitoring and incident-response load is an order of magnitude heavier than batch. Without an ops team, an incident can lose hours of data.
| Operational regime | Recommended |
|---|---|
| No dedicated SRE (Site Reliability Engineering)/data engineers | Don’t choose streaming |
| Can use managed services | Kinesis, Pub/Sub |
| Can self-operate Kafka | Kafka + Flink |
| 24/7 ops team available | Serious streaming platform |
3. Data volume and budget
Streaming-platform pricing scales with data volume. Kafka brokers, Kinesis shards, Pub/Sub message counts - left unattended, monthly bills hit millions of yen.
| Data volume | Composition image | Monthly target |
|---|---|---|
| ~1M events/day | Pub/Sub + Cloud Functions | Tens of thousands of yen |
| ~100M events/day | Kinesis + Lambda | Hundreds of thousands of yen |
| Tens of billions/day | Kafka + Flink self-operated | Millions+ |
How to choose by case
Small / AWS / SaaS startup
Kinesis Data Streams + Lambda. Managed, near-zero operations. Runs from tens of thousands of yen monthly.
Mid-size / GCP / data-analytics-focused
Pub/Sub + Dataflow. Beam-based with low learning cost. Excellent integration with BigQuery.
Large / can self-operate
Kafka + Flink. The strongest combination, but requires an ops team. 2-3 dedicated SREs are wanted.
Want SQL only
ksqlDB + Kafka or Materialize. SQL-writing analysts can operate it.
Don’t really need immediacy
Microbatch (dbt + 15-min schedule). Avoid streaming operations while looking real-time-ish.
Author’s note - the real meaning of “we want real-time”
There’s a story often told about a project where the customer said “we want a real-time dashboard,” and the team “spent 3 months building a Kafka + Flink composition,” only to re-hear later that the actual business requirement was “30-min delay is fine.” For a project that cron with a 15-min schedule would have covered, the team then spent the next year being chased by midnight incident response - a typical case told paired with that punchline.
Another famous one is the November 2020 large-scale AWS Kinesis outage. In US-East region, Kinesis Data Streams went down for hours, dragging in even AWS’s own management console and CloudWatch - an event widely talked about as the lesson that the moment you depend on a streaming platform, its outage stops all your business. Real-time platforms can become “not just a convenient tool but a new single point of failure.”
I myself developed the habit, when a customer asks “we want to see numbers in real time,” to first ask back “is a few seconds’ delay troublesome?” or “what about 5 minutes?” - because I’ve taken bumps from this kind of project in the past. Both show that going off the basic “streaming only when truly needed” causes operational load and availability risk to rebound simultaneously. If microbatch suffices, it’s safest and cheapest - the practical conclusion.
When told “real-time,” break it down by numbers first. 5-min and 100ms delays are different worlds.
Phased decision matrix for freshness x streaming adoption
When told “real-time,” first break it down by numbers - that’s practice. The optimum differs per freshness requirement.
| Freshness requirement | Adopted tech | Monthly ops cost | Required SREs |
|---|---|---|---|
| Daily (yesterday’s by morning) | Daily batch (dbt) | Thousands of yen | 0 (concurrent role) |
| Hours-delay OK | Hourly batch | Tens of thousands | 0 (concurrent role) |
| 5-15 min delay OK | Microbatch (15-min dbt) | Tens of thousands | 0 |
| 1 min to a few sec | Lightweight streaming (Pub/Sub + Lambda) | Hundreds of thousands | 1 |
| Sub-100ms required | Serious streaming (Kafka + Flink) | Millions+ | 2-3 dedicated |
“The substantive lower bound for adopting streaming is 2+ dedicated SREs.” Adopt below that and the team melts under 24/7 incident response, window design, and Exactly-Once operations. The empirical rule: 90% of business requirements are fine with daily batch or microbatch. To “look real-time-ish,” microbatch is enough.
Real-time is 10x batch’s operational cost. Limit it to truly necessary scenes.
Streaming-operation pitfalls and forbidden moves
Here are the typical accidents in streaming. All of them lead directly to data loss, double processing, or full service stoppage.
| Forbidden move | Why it’s bad |
|---|---|
| Adopt streaming immediately on “customer wants real-time” | Many actually OK with 30-min delay. Hear out requirements first |
| At-Least-Once without idempotency | Network failures cause double payments, double inventory deduction |
| Self-operate Kafka without dedicated SRE | Zookeeper/KRaft management, partition design melt the team |
| Don’t distinguish event time and processing time | Late-arriving events break aggregation. Strict window design |
| Operate Kafka with JSON freedom (no schema) | Consumers keep breaking. Protobuf/Avro + Schema Registry required |
| The moment you depend on a streaming platform, all business depends | SPOF like the November 2020 AWS Kinesis outage |
| Trust Exactly-Once entirely | Meaningless without idempotency on the consumer side. Both shield and spear needed |
| Operate without a DLQ (Dead Letter Queue) | Failed messages retry forever, clog up |
| Set monitoring at batch granularity | Streaming needs second-level monitoring. Constantly watch message lag and consumer lag |
| No prep for traffic surges | Kafka partition shortage causes processing delay to snowball |
| Schema changes without forward compatibility | All old-version consumers die. Follow Avro/Protobuf compatibility rules |
The November 25, 2020 large-scale AWS Kinesis outage was an event where us-east-1’s Kinesis Data Streams stopped for hours, with cascading impact to many AWS services like CloudWatch, Cognito, and SQS. A lesson showing that the moment you depend on a streaming platform, its outage stops all your business.
For serious streaming, 2-3 dedicated SREs and Schema Registry are the minimum requirements.
AI-era perspective
When AI-driven development (vibe coding) and AI usage are the premise, streaming gains importance as the path that immediately reflects AI-agent decisions. Fraud/anomaly detection and dynamic recommendation premise platforms where AI judges in real time.
| Favored in the AI era | Disfavored in the AI era |
|---|---|
| Managed (Kinesis, Pub/Sub) | Self-built Kafka (operations hard for AI to learn) |
| Schema-driven (Avro, Protobuf) | JSON freedom |
| Event-driven (documented) | Implicit timing dependencies |
| SQL-centric (ksqlDB, Flink SQL) | Custom DSL (Domain-Specific Language) |
In the era where AI writes processing, streams with explicit schema and contract are favored. Manage types with Protobuf + Schema Registry, building a state where AI can safely generate processing - the new standard.
For real-time platforms, make schemas explicit. AI-era streaming lives or dies by “types.”
Common misconceptions
- Adopt because real-time is cooler - operational cost is an order of magnitude higher. 90% of business requirements are fine with batch. Tech selection is decided by necessity
- Putting in Kafka solves problems - Kafka is just the “foundation.” Processing engines (Flink etc.), schema management, monitoring, and ops regime are also needed - heavy overall
- Exactly-Once means safe - guarantees aren’t omnipotent. Without consumer-side idempotency, end-to-end isn’t protected
- Streaming so it’s fast - depending on design it can be slower than batch. Wrong window/state/retry choices yield no performance
What to decide - what is your project’s answer?
For each of the following, try to articulate your project’s answer in 1-2 sentences. Starting work with these vague always invites later questions like “why did we decide this again?”
- Is real-time truly needed (re-confirm requirements)
- Message queue (Kafka / Kinesis / Pub-Sub)
- Processing engine (Flink / ksqlDB / Spark Streaming / not needed)
- Guarantee level (At-Least-Once / Exactly-Once)
- Schema management (Avro/Protobuf + Schema Registry)
- Window design (time types, delay tolerance)
- Monitoring/alerting (SLO (Service Level Objective), metrics, failure notifications)
How to make the final call
The core of streaming is starting from awareness that operational cost is 10x batch. Real-time isn’t chosen because it’s cool - 90% of business requirements are fine with batch or microbatch. Sub-100ms latency truly creates value only in limited cases - payments, fraud detection, ad bidding, IoT control - and choosing streaming elsewhere exhausts the team in 24/7 ops, Exactly-Once design, and window-management difficulty. The correct judgment order is “first ask if truly needed,” then “ask if there’s an ops regime.”
The decisive axis is whether schemas and types are explicit. As streaming’s importance grows as a path for AI agents to make real-time decisions, with JSON-freedom design, AI can’t generate processing. Make types explicit with Protobuf/Avro + Schema Registry, write SQL-centric (ksqlDB/Flink SQL) - the form of a streaming platform that doesn’t become outdated in the AI era.
Selection priorities
- Question whether real-time is truly needed - if a few minutes’ delay is OK, substitute with microbatch
- Managed first - lower ops load with Kinesis/Pub-Sub; Kafka self-operation requires dedicated SREs
- Make schemas explicit - Protobuf/Avro + Schema Registry; no JSON freedom
- Design consumers idempotent - Exactly-Once is the shield, idempotency is the spear; both together
“Streaming only when truly needed.” Settle ops regime and schema design before choosing.
Summary
This article covered streaming, including selection of Kafka/Kinesis/Pub-Sub/Flink/ksqlDB, Exactly-Once and window processing, the freshness x operational-cost matrix, and judgment axes for avoiding over-investment in real-time.
Question whether real-time is truly needed, prioritize managed services, make schemas explicit, and design consumers idempotent. That is the practical answer for streaming in 2026.
Next time we’ll cover data governance (master management, catalog, regulatory compliance).
Back to series TOC -> ‘Architecture Crash Course for the Generative-AI Era’: How to Read This Book
I hope you’ll read the next article as well.
📚 Series: Architecture Crash Course for the Generative-AI Era (44/89)