About this article
As the sixth installment of the “Data Architecture” category in the series “Architecture Crash Course for the Generative-AI Era,” this article explains streaming.
Question the “real-time” requirement and 90% of the time it lands at “5-min-delay batch is enough.” This article covers streaming-platform selection (Kafka/Kinesis/Pub-Sub/Flink/ksqlDB), Exactly-Once, window processing, and decision criteria - presenting the practical iron rule of questioning whether real-time is truly needed first.
What is streaming
In a nutshell, streaming is “a mechanism that processes data in real time the instant it’s generated, continuously.”
Think of a conveyor-belt sushi restaurant. Batch processing is the “collect orders and send them to the kitchen all at once” approach. Streaming is the “the moment an order comes in, put it on the belt and it passes in front of the customer” approach. It’s used in scenarios where “even a few seconds’ delay is unacceptable” — fraud detection, stock-price updates, IoT sensors — but it costs 10x the operational overhead of batch, so the rule is to choose it only when truly necessary.
What streaming handles
At the core sits a message queue (Kafka, Kinesis, Pub/Sub) that retains events as a persistent log and simultaneously delivers them to multiple consumers. The feature is that data never stops, it keeps flowing - a fundamentally different worldview from batch.
Choose streaming only when truly needed. Operational cost is 10x batch.
Why it’s needed
1. Sub-second delay directly drives business value
Fraud detection, inventory linking, price linking, IoT control - “a few minutes later” is too late. From event occurrence to judgment must be sub-second.
2. Batch windows are disappearing
In 24/7 global operations, time for “overnight batch” is not available. Constantly flowing data needs to be processed as it arrives.
3. Loose coupling between microservices
In microservices architectures, connecting services via events is mainstream. The streaming platform functions as the central nervous system of inter-service communication.
Batch vs streaming
| Batch processing | Streaming | |
|---|---|---|
| Processing unit | Bundled data | 1 event to a few |
| Delay | Hours to days | Milliseconds to seconds |
| Implementation | Relatively easy | Hard, heavy operation |
| Cost | Cheap | Expensive |
| Retry | Easy redo | Hard to design |
| Representative tech | Spark, dbt | Kafka, Flink |
Most business requirements are fine with batch, and scenes where streaming is truly necessary are limited. If “looks real-time-ish” is enough, substituting with 15-min microbatch often works.
Main components
A streaming platform splits into “the layer that carries events” and “the layer that processes events.” The former is the message queue (Kafka, etc.), the latter is the stream-processing engine (Flink, etc.) - roles differ, so select them separately.
| Layer | Role | Representatives |
|---|---|---|
| Message queue | Persist and deliver events | Kafka, Kinesis, Pub/Sub |
| Stream-processing engine | Aggregate, transform, join | Flink, ksqlDB, Spark Streaming |
| Schema management | Define message types | Schema Registry, Protobuf |
Apache Kafka (industry-standard message queue)
Apache Kafka is OSS originating at LinkedIn and is the de facto standard for streaming platforms. The features are high throughput handling millions of events per second, designs that “persist events as a log,” and a mechanism where multiple consumers can independently read - adopted by mega-companies worldwide like Netflix, Uber, and LINE. Confluent Platform (commercial) and Confluent Cloud (managed) are also options.
The strength is “high performance and scalability,” but at the cost of extremely heavy operational load. Managing Zookeeper (KRaft today), broker partition design, consumer-group coordination - serious use is hard without a dedicated ops team.
| Pros | Cons |
|---|---|
| Overwhelming performance, track record | Heavy operational load |
| OSS, thin vendor lock-in | High learning cost |
| Rich ecosystem (Connect, Streams, etc.) | Excessive at small scale |
| Low latency (millisecond order) | High cluster-design difficulty |
If you can self-operate Kafka, it’s the strongest; if not, consider managed (Kinesis/Pub-Sub/Confluent Cloud).
Managed queues (Kinesis / Pub/Sub / Event Hubs)
Cloud-vendor-provided Kafka alternatives. The cloud handles operations, eliminating worries about scaling, availability, and backup - the biggest charm is that even small teams can have a streaming platform. AWS uses Kinesis, GCP uses Pub/Sub, Azure uses Event Hubs as standard choices.
| Pros | Cons |
|---|---|
| Near-zero operations | Cloud lock-in |
| Easy to start small | Can be more expensive at large scale |
| Easy integration with other managed services | Fine-grained tuning is hard |
| Cloud handles failures | Kafka-specific features unavailable |
Representatives: Amazon Kinesis Data Streams, Google Pub/Sub, Azure Event Hubs, Confluent Cloud
The modern rule: managed first, migrate to Kafka if you hit throughput limits.
Apache Flink (top processing engine)
Apache Flink is OSS specialized in stateful stream processing, executing complex aggregation, join, and event-time processing at millisecond latency. Used by Uber, Alibaba, Stripe at the scale of tens of billions of events per day - the serious option that implements Exactly-Once with the highest reliability.
On the other hand, operational difficulty exceeds Kafka - checkpoint design, state-backend selection, job-restart management - learning costs are very high. Managed versions exist like AWS Kinesis Data Analytics and Aliyun Realtime Compute, and adopting via these to lower operational load is realistic.
| Pros | Cons |
|---|---|
| Low latency, high throughput | High learning cost |
| Flexible to write complex processing | High operational difficulty |
| Robust Exactly-Once | Excessive at small scale |
| Strong event-time processing | Java/Scala primary (Python also) |
ksqlDB / Kafka Streams (SQL and Java library)
Lightweight processing engines specific to Kafka. ksqlDB is a product that handles Kafka via SQL, expressing aggregation and filtering in SQL without writing serious Flink-class processing. Kafka Streams is a library, with the appeal of being embeddable in applications to write stream processing.
Both presuppose Kafka and can’t be used with non-Kafka queues (Kinesis, etc.). Effective for SQL-completable use cases or wanting to embed processing in existing Java apps. Can’t do as complex processing as Flink, but the appeal is “an order of magnitude lower learning cost.”
For “scales completable with Kafka + SQL,” ksqlDB is the shortest route. Migrate to Flink when complexity grows.
Typical composition example
A typical streaming-platform composition is below. From event source to BI/DB, the decisive difference from batch is flowing in real time.
The general split is left-side real-time processing and right-side analytics-bound ingestion - a two-line split.
The difficulty of Exactly-Once
The most troublesome thing in streaming is realizing the “guarantee of processing a message exactly once” (Exactly-Once). Network failures, restarts, and timeouts easily cause double processing or loss. In businesses like bank transfers, payments, or inventory updates, duplication is critical.
Kafka and Flink support Exactly-Once, but “end-to-end guarantees require design,” and unless the consumer side is also designed idempotent (same input gives same result), it’s meaningless.
| Guarantee level | Meaning | Difficulty |
|---|---|---|
| At-Most-Once | Give up on failure (loss possible) | Easy |
| At-Least-Once | Reliably delivered, with possible duplicates | Mid |
| Exactly-Once | Strictly once | Hard |
To avoid double processing, the royal road is to design the consumer side idempotent. Exactly-Once is the shield, idempotency is the spear.
Window processing
Streaming sees frequent time-bucketed aggregation (window processing) like “sales in the last 5 minutes” or “errors per hour.” What’s an easy aggregation in batch becomes a design issue in an unending stream of “where to cut.”
| Window type | Content | Example |
|---|---|---|
| Tumbling | Fixed-length, no overlap | 0-5 min, 5-10 min |
| Sliding | Fixed-length, slid forward | Last 5 min (updated every 1 min) |
| Session | Until activity breaks | One user’s visit session |
| Global | All time | Cumulative count |
Additionally, distinguishing event time (occurrence time) from processing time (arrival time) matters - network delays disorder things, and “how to handle late-arriving events” becomes a design point.
Decision criteria
1. Is real-time truly needed
When considering streaming, the first thing to ask is whether real-time is truly needed. Hearings often reveal “a few minutes’ delay is OK,” letting batch operate cheaply and stably.
| Requirement | Recommended |
|---|---|
| Days of delay OK | Daily batch |
| Hours of delay OK | Hourly batch |
| 5-15 min delay OK | Microbatch |
| Seconds to 1 min OK | Lightweight streaming |
| Sub-100ms required | Serious streaming |
Businesses where sub-100ms is truly needed are limited - “payments, fraud detection, ad bidding, IoT control, exchanges.”
2. Operational regime
Streaming “runs 24/7 continuously,” so monitoring and incident-response load is an order of magnitude heavier than batch. Without an ops team, an incident can lose hours of data.
| Operational regime | Recommended |
|---|---|
| No dedicated SRE/data engineers | Don’t choose streaming |
| Can use managed services | Kinesis, Pub/Sub |
| Can self-operate Kafka | Kafka + Flink |
| 24/7 ops team available | Serious streaming platform |
3. Data volume and budget
Streaming-platform pricing scales with data volume. Kafka brokers, Kinesis shards, Pub/Sub message counts - left unattended, monthly bills hit millions of yen.
| Data volume | Composition image | Monthly target |
|---|---|---|
| ~1M events/day | Pub/Sub + Cloud Functions | Tens of thousands of yen |
| ~100M events/day | Kinesis + Lambda | Hundreds of thousands of yen |
| Tens of billions/day | Kafka + Flink self-operated | Millions+ |
How to choose by case
Small / AWS / SaaS startup
Kinesis Data Streams + Lambda. Managed, near-zero operations. Runs from tens of thousands of yen monthly.
Mid-size / GCP / data-analytics-focused
Pub/Sub + Dataflow. Beam-based with low learning cost. Excellent integration with BigQuery.
Large / can self-operate
Kafka + Flink. The strongest combination, but requires an ops team. 2-3 dedicated SREs are wanted.
Want SQL only
ksqlDB + Kafka or Materialize. SQL-writing analysts can operate it.
Don’t really need immediacy
Microbatch (dbt + 15-min schedule). Avoid streaming operations while looking real-time-ish.
Author’s note - the real meaning of “we want real-time”
There’s a story often told about a project where the customer said “we want a real-time dashboard,” and the team “spent 3 months building a Kafka + Flink composition,” only to re-hear later that the actual business requirement was “30-min delay is fine.” For a project that cron with a 15-min schedule would have covered, the team then spent the next year being chased by midnight incident response - a typical case told paired with that punchline.
Another famous one is the November 2020 large-scale AWS Kinesis outage. In US-East region, Kinesis Data Streams went down for hours, dragging in even AWS’s own management console and CloudWatch - an event widely talked about as the lesson that the moment you depend on a streaming platform, its outage stops all your business. Real-time platforms can become “not just a convenient tool but a new single point of failure.”
I myself developed the habit, when a customer asks “we want to see numbers in real time,” to first ask back “is a few seconds’ delay troublesome?” or “what about 5 minutes?” - because I’ve taken bumps from this kind of project in the past. Both show that going off the basic “streaming only when truly needed” causes operational load and availability risk to rebound simultaneously. If microbatch suffices, it’s safest and cheapest - the practical conclusion.
When told “real-time,” break it down by numbers first. 5-min and 100ms delays are different worlds.
Phased decision matrix for freshness x streaming adoption
When told “real-time,” first break it down by numbers - that’s practice. The optimum differs per freshness requirement.
| Freshness requirement | Adopted tech | Monthly ops cost | Required SREs |
|---|---|---|---|
| Daily (yesterday’s by morning) | Daily batch (dbt) | Thousands of yen | 0 (concurrent role) |
| Hours-delay OK | Hourly batch | Tens of thousands | 0 (concurrent role) |
| 5-15 min delay OK | Microbatch (15-min dbt) | Tens of thousands | 0 |
| 1 min to a few sec | Lightweight streaming (Pub/Sub + Lambda) | Hundreds of thousands | 1 |
| Sub-100ms required | Serious streaming (Kafka + Flink) | Millions+ | 2-3 dedicated |
“The substantive lower bound for adopting streaming is 2+ dedicated SREs.” Adopt below that and the team melts under 24/7 incident response, window design, and Exactly-Once operations. The empirical rule: 90% of business requirements are fine with daily batch or microbatch. To “look real-time-ish,” microbatch is enough.
Real-time is 10x batch’s operational cost. Limit it to truly necessary scenes.
Streaming-operation pitfalls and forbidden moves
Here are the typical accidents in streaming. All of them lead directly to data loss, double processing, or full service stoppage.
| Forbidden move | Why it’s bad |
|---|---|
| Adopt streaming immediately on “customer wants real-time” | Many actually OK with 30-min delay. Hear out requirements first |
| At-Least-Once without idempotency | Network failures cause double payments, double inventory deduction |
| Self-operate Kafka without dedicated SRE | Zookeeper/KRaft management, partition design melt the team |
| Don’t distinguish event time and processing time | Late-arriving events break aggregation. Strict window design |
| Operate Kafka with JSON freedom (no schema) | Consumers keep breaking. Protobuf/Avro + Schema Registry required |
| The moment you depend on a streaming platform, all business depends | SPOF like the November 2020 AWS Kinesis outage |
| Trust Exactly-Once entirely | Meaningless without idempotency on the consumer side. Both shield and spear needed |
| Operate without a DLQ (Dead Letter Queue) | Failed messages retry forever, clog up |
| Set monitoring at batch granularity | Streaming needs second-level monitoring. Constantly watch message lag and consumer lag |
| No prep for traffic surges | Kafka partition shortage causes processing delay to snowball |
| Schema changes without forward compatibility | All old-version consumers die. Follow Avro/Protobuf compatibility rules |
| Assuming “putting in Kafka solves problems” | Processing engines, schema management, monitoring, and ops regime are also needed — heavy overall |
| Assuming “streaming means fast” | Depending on design it can be slower than batch; wrong window/state/retry choices yield no performance |
The November 25, 2020 large-scale AWS Kinesis outage was an event where us-east-1’s Kinesis Data Streams stopped for hours, with cascading impact to many AWS services like CloudWatch, Cognito, and SQS. A lesson showing that the moment you depend on a streaming platform, its outage stops all your business.
For serious streaming, 2-3 dedicated SREs and Schema Registry are the minimum requirements.
AI decision axes
| AI-era favorable | AI-era unfavorable |
|---|---|
| Managed (Kinesis, Pub/Sub) | Self-built Kafka (operations hard for AI to learn) |
| Schema-driven (Avro, Protobuf) | JSON freedom |
| Event-driven (documented) | Implicit timing dependencies |
| SQL-centric (ksqlDB, Flink SQL) | Custom DSL (Domain-Specific Language) |
- Question whether real-time is truly needed — if a few minutes’ delay is OK, substitute with microbatch.
- Managed first — lower ops load with Kinesis/Pub-Sub; Kafka self-operation requires dedicated SREs.
- Make schemas explicit — Protobuf/Avro + Schema Registry; no JSON freedom.
- Design consumers idempotent — Exactly-Once is the shield, idempotency is the spear; both together.
Managed streaming has abundant AI training data
Kinesis Data Streams and Cloud Pub/Sub have rich official documentation and sample code, so AI can accurately generate configuration code (Terraform) and Producer/Consumer code. Self-operated Kafka clusters have many project-specific settings, and there are cases where AI’s general knowledge alone can’t handle them accurately.
Schema-driven event design raises AI generation accuracy
When event schemas are registered in a Schema Registry with Avro/Protobuf, AI can accurately grasp “which fields are available in this event” and generate Consumer code. With schema-less free-form JSON, the event structure must be taught to AI each time.
What to decide - what is your project’s answer?
For each of the following, try to articulate your project’s answer in 1-2 sentences. Starting work with these vague always invites later questions like “why did we decide this again?”
- Is real-time truly needed (re-confirm requirements)
- Message queue (Kafka / Kinesis / Pub-Sub)
- Processing engine (Flink / ksqlDB / Spark Streaming / not needed)
- Guarantee level (At-Least-Once / Exactly-Once)
- Schema management (Avro/Protobuf + Schema Registry)
- Window design (time types, delay tolerance)
- Monitoring/alerting (SLO, metrics, failure notifications)
Related Articles
https://en.senkohome.com/arch-intro-data-datastore/ https://en.senkohome.com/arch-intro-data-etl/ https://en.senkohome.com/arch-intro-data-governance/
Summary
This article covered streaming, including selection of Kafka/Kinesis/Pub-Sub/Flink/ksqlDB, Exactly-Once and window processing, the freshness x operational-cost matrix, and judgment axes for avoiding over-investment in real-time.
Question whether real-time is truly needed, prioritize managed services, make schemas explicit, and design consumers idempotent. That is the practical answer for streaming in 2026.
Next time we’ll cover data governance (master management, catalog, regulatory compliance).
Back to series TOC -> ‘Architecture Crash Course for the Generative-AI Era’: How to Read This Book
I hope you’ll read the next article as well.
📚 Series: Architecture Crash Course for the Generative-AI Era (44/89)