[Data Architecture] Streaming - Question Whether It's Truly Needed First

About this article

As the sixth installment of the “Data Architecture” category in the series “Architecture Crash Course for the Generative-AI Era,” this article explains streaming.

Question the “real-time” requirement and 90% of the time it lands at “5-min-delay batch is enough.” This article covers streaming-platform selection (Kafka/Kinesis/Pub-Sub/Flink/ksqlDB), Exactly-Once, window processing, and decision criteria - presenting the practical iron rule of questioning whether real-time is truly needed first.

What streaming handles

At the core sits a message queue (Kafka, Kinesis, Pub/Sub) that retains events as a persistent log and simultaneously delivers them to multiple consumers. The feature is that data never stops, it keeps flowing - a fundamentally different worldview from batch.

Choose streaming only when truly needed. Operational cost is 10x batch.

Why it’s needed

1. Sub-second delay directly drives business value

Fraud detection, inventory linking, price linking, IoT control - “a few minutes later” is too late. From event occurrence to judgment must be sub-second.

2. Batch windows are disappearing

In 24/7 global operations, time for “overnight batch” is not available. Constantly flowing data needs to be processed as it arrives.

3. Loose coupling between microservices

In microservices architectures, connecting services via events is mainstream. The streaming platform functions as the central nervous system of inter-service communication.

Batch vs streaming

	Batch processing	Streaming
Processing unit	Bundled data	1 event to a few
Delay	Hours to days	Milliseconds to seconds
Implementation	Relatively easy	Hard, heavy operation
Cost	Cheap	Expensive
Retry	Easy redo	Hard to design
Representative tech	Spark, dbt	Kafka, Flink

Most business requirements are fine with batch, and scenes where streaming is truly necessary are limited. If “looks real-time-ish” is enough, substituting with 15-min microbatch often works.

Main components

A streaming platform splits into “the layer that carries events” and “the layer that processes events.” The former is the message queue (Kafka, etc.), the latter is the stream-processing engine (Flink, etc.) - roles differ, so select them separately.

flowchart LR
    PROD[Producer<br/>business systems/IoT]
    MQ["Message queue<br/>(carrier layer)<br/>Kafka/Kinesis/Pub-Sub"]
    PROC["Processing engine<br/>(aggregate/transform/join)<br/>Flink/ksqlDB"]
    SINK1[(DWH/<br/>BigQuery)]
    SINK2[(Search<br/>Elasticsearch)]
    SINK3[Realtime<br/>dashboard]
    PROD -->|Event| MQ
    MQ --> PROC
    PROC --> SINK1
    PROC --> SINK2
    PROC --> SINK3
    MQ -. direct sink .-> SINK1
    classDef prod fill:#fef3c7,stroke:#d97706;
    classDef mq fill:#dbeafe,stroke:#2563eb,stroke-width:2px;
    classDef proc fill:#fae8ff,stroke:#a21caf;
    classDef sink fill:#dcfce7,stroke:#16a34a;
    class PROD prod;
    class MQ mq;
    class PROC proc;
    class SINK1,SINK2,SINK3 sink;

Layer	Role	Representatives
Message queue	Persist and deliver events	Kafka, Kinesis, Pub/Sub
Stream-processing engine	Aggregate, transform, join	Flink, ksqlDB, Spark Streaming
Schema management	Define message types	Schema Registry, Protobuf

Apache Kafka (industry-standard message queue)

Apache Kafka is OSS originating at LinkedIn and is the de facto standard for streaming platforms. The features are high throughput handling millions of events per second, designs that “persist events as a log,” and a mechanism where multiple consumers can independently read - adopted by mega-companies worldwide like Netflix, Uber, and LINE. Confluent Platform (commercial) and Confluent Cloud (managed) are also options.

The strength is “high performance and scalability,” but at the cost of extremely heavy operational load. Managing Zookeeper (KRaft today), broker partition design, consumer-group coordination - serious use is hard without a dedicated ops team.

Pros	Cons
Overwhelming performance, track record	Heavy operational load
OSS, thin vendor lock-in	High learning cost
Rich ecosystem (Connect, Streams, etc.)	Excessive at small scale
Low latency (millisecond order)	High cluster-design difficulty

If you can self-operate Kafka, it’s the strongest; if not, consider managed (Kinesis/Pub-Sub/Confluent Cloud).

Managed queues (Kinesis / Pub/Sub / Event Hubs)

Cloud-vendor-provided Kafka alternatives. The cloud handles operations, eliminating worries about scaling, availability, and backup - the biggest charm is that even small teams can have a streaming platform. AWS uses Kinesis, GCP uses Pub/Sub, Azure uses Event Hubs as standard choices.

Pros	Cons
Near-zero operations	Cloud lock-in
Easy to start small	Can be more expensive at large scale
Easy integration with other managed services	Fine-grained tuning is hard
Cloud handles failures	Kafka-specific features unavailable

Representatives: Amazon Kinesis Data Streams, Google Pub/Sub, Azure Event Hubs, Confluent Cloud

The modern rule: managed first, migrate to Kafka if you hit throughput limits.

Apache Flink (top processing engine)

Apache Flink is OSS specialized in stateful stream processing, executing complex aggregation, join, and event-time processing at millisecond latency. Used by Uber, Alibaba, Stripe at the scale of tens of billions of events per day - the serious option that implements Exactly-Once with the highest reliability.

On the other hand, operational difficulty exceeds Kafka - checkpoint design, state-backend selection, job-restart management - learning costs are very high. Managed versions exist like AWS Kinesis Data Analytics and Aliyun Realtime Compute, and adopting via these to lower operational load is realistic.

Pros	Cons
Low latency, high throughput	High learning cost
Flexible to write complex processing	High operational difficulty
Robust Exactly-Once	Excessive at small scale
Strong event-time processing	Java/Scala primary (Python also)

ksqlDB / Kafka Streams (SQL and Java library)

Lightweight processing engines specific to Kafka. ksqlDB is a product that handles Kafka via SQL, expressing aggregation and filtering in SQL without writing serious Flink-class processing. Kafka Streams is a library, with the appeal of being embeddable in applications to write stream processing.

Both presuppose Kafka and can’t be used with non-Kafka queues (Kinesis, etc.). Effective for SQL-completable use cases or wanting to embed processing in existing Java apps. Can’t do as complex processing as Flink, but the appeal is “an order of magnitude lower learning cost.”

For “scales completable with Kafka + SQL,” ksqlDB is the shortest route. Migrate to Flink when complexity grows.

Typical composition example

A typical streaming-platform composition is below. From event source to BI/DB, the decisive difference from batch is flowing in real time.

flowchart TB
    SRC([Web / mobile / IoT / business systems])
    BUS["Kafka / Kinesis / Pub/Sub"]
    RT["Real-time processing<br/>Flink, ksqlDB"]
    DWH["DWH ingestion<br/>Fivetran / Snowpipe"]
    ACT["Immediate action<br/>notify/block"]
    BI["Analysis / BI"]
    SRC -->|event publish| BUS
    BUS --> RT --> ACT
    BUS --> DWH --> BI
    classDef src fill:#fef3c7,stroke:#d97706;
    classDef bus fill:#dbeafe,stroke:#2563eb,stroke-width:2px;
    classDef rt fill:#fae8ff,stroke:#a21caf;
    classDef bi fill:#dcfce7,stroke:#16a34a;
    class SRC src;
    class BUS bus;
    class RT,ACT rt;
    class DWH,BI bi;

The general split is left-side real-time processing and right-side analytics-bound ingestion - a two-line split.

The difficulty of Exactly-Once

The most troublesome thing in streaming is realizing the “guarantee of processing a message exactly once” (Exactly-Once). Network failures, restarts, and timeouts easily cause double processing or loss. In businesses like bank transfers, payments, or inventory updates, duplication is critical.

Kafka and Flink support Exactly-Once, but “end-to-end guarantees require design,” and unless the consumer side is also designed idempotent (same input gives same result), it’s meaningless.

Guarantee level	Meaning	Difficulty
At-Most-Once	Give up on failure (loss possible)	Easy
At-Least-Once	Reliably delivered, with possible duplicates	Mid
Exactly-Once	Strictly once	Hard

To avoid double processing, the royal road is to design the consumer side idempotent. Exactly-Once is the shield, idempotency is the spear.

Window processing

Streaming sees frequent time-bucketed aggregation (window processing) like “sales in the last 5 minutes” or “errors per hour.” What’s an easy aggregation in batch becomes a design issue in an unending stream of “where to cut.”

Window type	Content	Example
Tumbling	Fixed-length, no overlap	0-5 min, 5-10 min
Sliding	Fixed-length, slid forward	Last 5 min (updated every 1 min)
Session	Until activity breaks	One user’s visit session
Global	All time	Cumulative count

Additionally, distinguishing event time (occurrence time) from processing time (arrival time) matters - network delays disorder things, and “how to handle late-arriving events” becomes a design point.

Decision criteria

1. Is real-time truly needed

When considering streaming, the first thing to ask is whether real-time is truly needed. Hearings often reveal “a few minutes’ delay is OK,” letting batch operate cheaply and stably.

Requirement	Recommended
Days of delay OK	Daily batch
Hours of delay OK	Hourly batch
5-15 min delay OK	Microbatch
Seconds to 1 min OK	Lightweight streaming
Sub-100ms required	Serious streaming

Businesses where sub-100ms is truly needed are limited - “payments, fraud detection, ad bidding, IoT control, exchanges.”

2. Operational regime

Streaming “runs 24/7 continuously,” so monitoring and incident-response load is an order of magnitude heavier than batch. Without an ops team, an incident can lose hours of data.

Operational regime	Recommended
No dedicated SRE (Site Reliability Engineering)/data engineers	Don’t choose streaming
Can use managed services	Kinesis, Pub/Sub
Can self-operate Kafka	Kafka + Flink
24/7 ops team available	Serious streaming platform

3. Data volume and budget

Streaming-platform pricing scales with data volume. Kafka brokers, Kinesis shards, Pub/Sub message counts - left unattended, monthly bills hit millions of yen.

Data volume	Composition image	Monthly target
~1M events/day	Pub/Sub + Cloud Functions	Tens of thousands of yen
~100M events/day	Kinesis + Lambda	Hundreds of thousands of yen
Tens of billions/day	Kafka + Flink self-operated	Millions+

How to choose by case

Small / AWS / SaaS startup

Kinesis Data Streams + Lambda. Managed, near-zero operations. Runs from tens of thousands of yen monthly.

Mid-size / GCP / data-analytics-focused

Pub/Sub + Dataflow. Beam-based with low learning cost. Excellent integration with BigQuery.

Large / can self-operate

Kafka + Flink. The strongest combination, but requires an ops team. 2-3 dedicated SREs are wanted.

Want SQL only

ksqlDB + Kafka or Materialize. SQL-writing analysts can operate it.

Don’t really need immediacy

Microbatch (dbt + 15-min schedule). Avoid streaming operations while looking real-time-ish.

Author’s note - the real meaning of “we want real-time”

There’s a story often told about a project where the customer said “we want a real-time dashboard,” and the team “spent 3 months building a Kafka + Flink composition,” only to re-hear later that the actual business requirement was “30-min delay is fine.” For a project that cron with a 15-min schedule would have covered, the team then spent the next year being chased by midnight incident response - a typical case told paired with that punchline.

Another famous one is the November 2020 large-scale AWS Kinesis outage. In US-East region, Kinesis Data Streams went down for hours, dragging in even AWS’s own management console and CloudWatch - an event widely talked about as the lesson that the moment you depend on a streaming platform, its outage stops all your business. Real-time platforms can become “not just a convenient tool but a new single point of failure.”

I myself developed the habit, when a customer asks “we want to see numbers in real time,” to first ask back “is a few seconds’ delay troublesome?” or “what about 5 minutes?” - because I’ve taken bumps from this kind of project in the past. Both show that going off the basic “streaming only when truly needed” causes operational load and availability risk to rebound simultaneously. If microbatch suffices, it’s safest and cheapest - the practical conclusion.

When told “real-time,” break it down by numbers first. 5-min and 100ms delays are different worlds.

Phased decision matrix for freshness x streaming adoption

When told “real-time,” first break it down by numbers - that’s practice. The optimum differs per freshness requirement.

Freshness requirement	Adopted tech	Monthly ops cost	Required SREs
Daily (yesterday’s by morning)	Daily batch (dbt)	Thousands of yen	0 (concurrent role)
Hours-delay OK	Hourly batch	Tens of thousands	0 (concurrent role)
5-15 min delay OK	Microbatch (15-min dbt)	Tens of thousands	0
1 min to a few sec	Lightweight streaming (Pub/Sub + Lambda)	Hundreds of thousands	1
Sub-100ms required	Serious streaming (Kafka + Flink)	Millions+	2-3 dedicated

“The substantive lower bound for adopting streaming is 2+ dedicated SREs.” Adopt below that and the team melts under 24/7 incident response, window design, and Exactly-Once operations. The empirical rule: 90% of business requirements are fine with daily batch or microbatch. To “look real-time-ish,” microbatch is enough.

Real-time is 10x batch’s operational cost. Limit it to truly necessary scenes.

Streaming-operation pitfalls and forbidden moves

Here are the typical accidents in streaming. All of them lead directly to data loss, double processing, or full service stoppage.

Forbidden move	Why it’s bad
Adopt streaming immediately on “customer wants real-time”	Many actually OK with 30-min delay. Hear out requirements first
At-Least-Once without idempotency	Network failures cause double payments, double inventory deduction
Self-operate Kafka without dedicated SRE	Zookeeper/KRaft management, partition design melt the team
Don’t distinguish event time and processing time	Late-arriving events break aggregation. Strict window design
Operate Kafka with JSON freedom (no schema)	Consumers keep breaking. Protobuf/Avro + Schema Registry required
The moment you depend on a streaming platform, all business depends	SPOF like the November 2020 AWS Kinesis outage
Trust Exactly-Once entirely	Meaningless without idempotency on the consumer side. Both shield and spear needed
Operate without a DLQ (Dead Letter Queue)	Failed messages retry forever, clog up
Set monitoring at batch granularity	Streaming needs second-level monitoring. Constantly watch message lag and consumer lag
No prep for traffic surges	Kafka partition shortage causes processing delay to snowball
Schema changes without forward compatibility	All old-version consumers die. Follow Avro/Protobuf compatibility rules

The November 25, 2020 large-scale AWS Kinesis outage was an event where us-east-1’s Kinesis Data Streams stopped for hours, with cascading impact to many AWS services like CloudWatch, Cognito, and SQS. A lesson showing that the moment you depend on a streaming platform, its outage stops all your business.

For serious streaming, 2-3 dedicated SREs and Schema Registry are the minimum requirements.

AI-era perspective

When AI-driven development (vibe coding) and AI usage are the premise, streaming gains importance as the path that immediately reflects AI-agent decisions. Fraud/anomaly detection and dynamic recommendation premise platforms where AI judges in real time.

Favored in the AI era	Disfavored in the AI era
Managed (Kinesis, Pub/Sub)	Self-built Kafka (operations hard for AI to learn)
Schema-driven (Avro, Protobuf)	JSON freedom
Event-driven (documented)	Implicit timing dependencies
SQL-centric (ksqlDB, Flink SQL)	Custom DSL (Domain-Specific Language)

In the era where AI writes processing, streams with explicit schema and contract are favored. Manage types with Protobuf + Schema Registry, building a state where AI can safely generate processing - the new standard.

For real-time platforms, make schemas explicit. AI-era streaming lives or dies by “types.”

Common misconceptions

Adopt because real-time is cooler - operational cost is an order of magnitude higher. 90% of business requirements are fine with batch. Tech selection is decided by necessity
Putting in Kafka solves problems - Kafka is just the “foundation.” Processing engines (Flink etc.), schema management, monitoring, and ops regime are also needed - heavy overall
Exactly-Once means safe - guarantees aren’t omnipotent. Without consumer-side idempotency, end-to-end isn’t protected
Streaming so it’s fast - depending on design it can be slower than batch. Wrong window/state/retry choices yield no performance

What to decide - what is your project’s answer?

For each of the following, try to articulate your project’s answer in 1-2 sentences. Starting work with these vague always invites later questions like “why did we decide this again?”

Is real-time truly needed (re-confirm requirements)
Message queue (Kafka / Kinesis / Pub-Sub)
Processing engine (Flink / ksqlDB / Spark Streaming / not needed)
Guarantee level (At-Least-Once / Exactly-Once)
Schema management (Avro/Protobuf + Schema Registry)
Window design (time types, delay tolerance)
Monitoring/alerting (SLO (Service Level Objective), metrics, failure notifications)

How to make the final call

The core of streaming is starting from awareness that operational cost is 10x batch. Real-time isn’t chosen because it’s cool - 90% of business requirements are fine with batch or microbatch. Sub-100ms latency truly creates value only in limited cases - payments, fraud detection, ad bidding, IoT control - and choosing streaming elsewhere exhausts the team in 24/7 ops, Exactly-Once design, and window-management difficulty. The correct judgment order is “first ask if truly needed,” then “ask if there’s an ops regime.”

The decisive axis is whether schemas and types are explicit. As streaming’s importance grows as a path for AI agents to make real-time decisions, with JSON-freedom design, AI can’t generate processing. Make types explicit with Protobuf/Avro + Schema Registry, write SQL-centric (ksqlDB/Flink SQL) - the form of a streaming platform that doesn’t become outdated in the AI era.

Selection priorities

Question whether real-time is truly needed - if a few minutes’ delay is OK, substitute with microbatch
Managed first - lower ops load with Kinesis/Pub-Sub; Kafka self-operation requires dedicated SREs
Make schemas explicit - Protobuf/Avro + Schema Registry; no JSON freedom
Design consumers idempotent - Exactly-Once is the shield, idempotency is the spear; both together

“Streaming only when truly needed.” Settle ops regime and schema design before choosing.

Summary

This article covered streaming, including selection of Kafka/Kinesis/Pub-Sub/Flink/ksqlDB, Exactly-Once and window processing, the freshness x operational-cost matrix, and judgment axes for avoiding over-investment in real-time.

Question whether real-time is truly needed, prioritize managed services, make schemas explicit, and design consumers idempotent. That is the practical answer for streaming in 2026.

Next time we’ll cover data governance (master management, catalog, regulatory compliance).

Back to series TOC -> ‘Architecture Crash Course for the Generative-AI Era’: How to Read This Book

I hope you’ll read the next article as well.