Data Architecture

[Data Architecture] Streaming - Question Whether It's Truly

[Data Architecture] Streaming - Question Whether It's Truly

About this article

As the sixth installment of the “Data Architecture” category in the series “Architecture Crash Course for the Generative-AI Era,” this article explains streaming.

Question the “real-time” requirement and 90% of the time it lands at “5-min-delay batch is enough.” This article covers streaming-platform selection (Kafka/Kinesis/Pub-Sub/Flink/ksqlDB), Exactly-Once, window processing, and decision criteria - presenting the practical iron rule of questioning whether real-time is truly needed first.

What is streaming

In a nutshell, streaming is “a mechanism that processes data in real time the instant it’s generated, continuously.”

Think of a conveyor-belt sushi restaurant. Batch processing is the “collect orders and send them to the kitchen all at once” approach. Streaming is the “the moment an order comes in, put it on the belt and it passes in front of the customer” approach. It’s used in scenarios where “even a few seconds’ delay is unacceptable” — fraud detection, stock-price updates, IoT sensors — but it costs 10x the operational overhead of batch, so the rule is to choose it only when truly necessary.

What streaming handles

At the core sits a message queue (Kafka, Kinesis, Pub/Sub) that retains events as a persistent log and simultaneously delivers them to multiple consumers. The feature is that data never stops, it keeps flowing - a fundamentally different worldview from batch.

Choose streaming only when truly needed. Operational cost is 10x batch.

Why it’s needed

1. Sub-second delay directly drives business value

Fraud detection, inventory linking, price linking, IoT control - “a few minutes later” is too late. From event occurrence to judgment must be sub-second.

2. Batch windows are disappearing

In 24/7 global operations, time for “overnight batch” is not available. Constantly flowing data needs to be processed as it arrives.

3. Loose coupling between microservices

In microservices architectures, connecting services via events is mainstream. The streaming platform functions as the central nervous system of inter-service communication.

Batch vs streaming

Batch processingStreaming
Processing unitBundled data1 event to a few
DelayHours to daysMilliseconds to seconds
ImplementationRelatively easyHard, heavy operation
CostCheapExpensive
RetryEasy redoHard to design
Representative techSpark, dbtKafka, Flink

Most business requirements are fine with batch, and scenes where streaming is truly necessary are limited. If “looks real-time-ish” is enough, substituting with 15-min microbatch often works.

Main components

A streaming platform splits into “the layer that carries events” and “the layer that processes events.” The former is the message queue (Kafka, etc.), the latter is the stream-processing engine (Flink, etc.) - roles differ, so select them separately.

2-Layer Structure of Streaming Platforms Split into "transport layer" and "processing layer." Different roles, choose separately Message Queue Layer Event persistence & delivery Apache Kafka OSS / Industry standard Kinesis AWS Managed Pub/Sub GCP Managed Event Hubs Azure Managed Data keeps flowing and never stops Stream events Stream Processing Engine Layer Aggregation, transformation, & joining Apache Flink Full-featured / high difficulty ksqlDB SQL processing / Kafka-specific Spark Streaming Suited for batch integration Kafka Streams Java library Schema Registry Type Definition Management Start with managed queue. Migrate to Kafka when hitting throughput limits — the modern standard
LayerRoleRepresentatives
Message queuePersist and deliver eventsKafka, Kinesis, Pub/Sub
Stream-processing engineAggregate, transform, joinFlink, ksqlDB, Spark Streaming
Schema managementDefine message typesSchema Registry, Protobuf

Apache Kafka (industry-standard message queue)

Apache Kafka is OSS originating at LinkedIn and is the de facto standard for streaming platforms. The features are high throughput handling millions of events per second, designs that “persist events as a log,” and a mechanism where multiple consumers can independently read - adopted by mega-companies worldwide like Netflix, Uber, and LINE. Confluent Platform (commercial) and Confluent Cloud (managed) are also options.

The strength is “high performance and scalability,” but at the cost of extremely heavy operational load. Managing Zookeeper (KRaft today), broker partition design, consumer-group coordination - serious use is hard without a dedicated ops team.

ProsCons
Overwhelming performance, track recordHeavy operational load
OSS, thin vendor lock-inHigh learning cost
Rich ecosystem (Connect, Streams, etc.)Excessive at small scale
Low latency (millisecond order)High cluster-design difficulty

If you can self-operate Kafka, it’s the strongest; if not, consider managed (Kinesis/Pub-Sub/Confluent Cloud).

Managed queues (Kinesis / Pub/Sub / Event Hubs)

Cloud-vendor-provided Kafka alternatives. The cloud handles operations, eliminating worries about scaling, availability, and backup - the biggest charm is that even small teams can have a streaming platform. AWS uses Kinesis, GCP uses Pub/Sub, Azure uses Event Hubs as standard choices.

ProsCons
Near-zero operationsCloud lock-in
Easy to start smallCan be more expensive at large scale
Easy integration with other managed servicesFine-grained tuning is hard
Cloud handles failuresKafka-specific features unavailable

Representatives: Amazon Kinesis Data Streams, Google Pub/Sub, Azure Event Hubs, Confluent Cloud

The modern rule: managed first, migrate to Kafka if you hit throughput limits.

Apache Flink is OSS specialized in stateful stream processing, executing complex aggregation, join, and event-time processing at millisecond latency. Used by Uber, Alibaba, Stripe at the scale of tens of billions of events per day - the serious option that implements Exactly-Once with the highest reliability.

On the other hand, operational difficulty exceeds Kafka - checkpoint design, state-backend selection, job-restart management - learning costs are very high. Managed versions exist like AWS Kinesis Data Analytics and Aliyun Realtime Compute, and adopting via these to lower operational load is realistic.

ProsCons
Low latency, high throughputHigh learning cost
Flexible to write complex processingHigh operational difficulty
Robust Exactly-OnceExcessive at small scale
Strong event-time processingJava/Scala primary (Python also)

ksqlDB / Kafka Streams (SQL and Java library)

Lightweight processing engines specific to Kafka. ksqlDB is a product that handles Kafka via SQL, expressing aggregation and filtering in SQL without writing serious Flink-class processing. Kafka Streams is a library, with the appeal of being embeddable in applications to write stream processing.

Both presuppose Kafka and can’t be used with non-Kafka queues (Kinesis, etc.). Effective for SQL-completable use cases or wanting to embed processing in existing Java apps. Can’t do as complex processing as Flink, but the appeal is “an order of magnitude lower learning cost.”

For “scales completable with Kafka + SQL,” ksqlDB is the shortest route. Migrate to Flink when complexity grows.

Typical composition example

A typical streaming-platform composition is below. From event source to BI/DB, the decisive difference from batch is flowing in real time.

Typical Streaming Platform Architecture Events flow continuously from sources to BI & DB in real-time Event Sources Clickstream Order Events IoT Sensors App Logs Kafka / Kinesis Message Queue Persist events Deliver to multiple consumers Schema Management Flink / ksqlDB Real-time Processing Aggregation, transformation, & alerts Fraud detection / inventory sync Alert Notifications Slack / PagerDuty Business DB Update Redis / PostgreSQL Analytics Ingestion DWH / Data Lake BigQuery / S3 BI Dashboard Looker / Tableau Machine Learning Model Training Data Real-time Side Analytics Side Note: Streaming costs 10x more to operate than batch. Choose only when sub-second freshness is truly needed 90% of business needs are met by daily batch. Many cases can use 15-min micro-batch instead When someone says "real-time," question it first. Check if micro-batch is sufficient — that's the standard

The general split is left-side real-time processing and right-side analytics-bound ingestion - a two-line split.

The difficulty of Exactly-Once

The most troublesome thing in streaming is realizing the “guarantee of processing a message exactly once” (Exactly-Once). Network failures, restarts, and timeouts easily cause double processing or loss. In businesses like bank transfers, payments, or inventory updates, duplication is critical.

Kafka and Flink support Exactly-Once, but “end-to-end guarantees require design,” and unless the consumer side is also designed idempotent (same input gives same result), it’s meaningless.

Guarantee levelMeaningDifficulty
At-Most-OnceGive up on failure (loss possible)Easy
At-Least-OnceReliably delivered, with possible duplicatesMid
Exactly-OnceStrictly onceHard

To avoid double processing, the royal road is to design the consumer side idempotent. Exactly-Once is the shield, idempotency is the spear.

Window processing

Streaming sees frequent time-bucketed aggregation (window processing) like “sales in the last 5 minutes” or “errors per hour.” What’s an easy aggregation in batch becomes a design issue in an unending stream of “where to cut.”

Window typeContentExample
TumblingFixed-length, no overlap0-5 min, 5-10 min
SlidingFixed-length, slid forwardLast 5 min (updated every 1 min)
SessionUntil activity breaksOne user’s visit session
GlobalAll timeCumulative count

Additionally, distinguishing event time (occurrence time) from processing time (arrival time) matters - network delays disorder things, and “how to handle late-arriving events” becomes a design point.

Decision criteria

1. Is real-time truly needed

When considering streaming, the first thing to ask is whether real-time is truly needed. Hearings often reveal “a few minutes’ delay is OK,” letting batch operate cheaply and stably.

RequirementRecommended
Days of delay OKDaily batch
Hours of delay OKHourly batch
5-15 min delay OKMicrobatch
Seconds to 1 min OKLightweight streaming
Sub-100ms requiredSerious streaming

Businesses where sub-100ms is truly needed are limited - “payments, fraud detection, ad bidding, IoT control, exchanges.”

2. Operational regime

Streaming “runs 24/7 continuously,” so monitoring and incident-response load is an order of magnitude heavier than batch. Without an ops team, an incident can lose hours of data.

Operational regimeRecommended
No dedicated SRE/data engineersDon’t choose streaming
Can use managed servicesKinesis, Pub/Sub
Can self-operate KafkaKafka + Flink
24/7 ops team availableSerious streaming platform

3. Data volume and budget

Streaming-platform pricing scales with data volume. Kafka brokers, Kinesis shards, Pub/Sub message counts - left unattended, monthly bills hit millions of yen.

Data volumeComposition imageMonthly target
~1M events/dayPub/Sub + Cloud FunctionsTens of thousands of yen
~100M events/dayKinesis + LambdaHundreds of thousands of yen
Tens of billions/dayKafka + Flink self-operatedMillions+

How to choose by case

Small / AWS / SaaS startup

Kinesis Data Streams + Lambda. Managed, near-zero operations. Runs from tens of thousands of yen monthly.

Mid-size / GCP / data-analytics-focused

Pub/Sub + Dataflow. Beam-based with low learning cost. Excellent integration with BigQuery.

Large / can self-operate

Kafka + Flink. The strongest combination, but requires an ops team. 2-3 dedicated SREs are wanted.

Want SQL only

ksqlDB + Kafka or Materialize. SQL-writing analysts can operate it.

Don’t really need immediacy

Microbatch (dbt + 15-min schedule). Avoid streaming operations while looking real-time-ish.

Author’s note - the real meaning of “we want real-time”

There’s a story often told about a project where the customer said “we want a real-time dashboard,” and the team “spent 3 months building a Kafka + Flink composition,” only to re-hear later that the actual business requirement was “30-min delay is fine.” For a project that cron with a 15-min schedule would have covered, the team then spent the next year being chased by midnight incident response - a typical case told paired with that punchline.

Another famous one is the November 2020 large-scale AWS Kinesis outage. In US-East region, Kinesis Data Streams went down for hours, dragging in even AWS’s own management console and CloudWatch - an event widely talked about as the lesson that the moment you depend on a streaming platform, its outage stops all your business. Real-time platforms can become “not just a convenient tool but a new single point of failure.”

I myself developed the habit, when a customer asks “we want to see numbers in real time,” to first ask back “is a few seconds’ delay troublesome?” or “what about 5 minutes?” - because I’ve taken bumps from this kind of project in the past. Both show that going off the basic “streaming only when truly needed” causes operational load and availability risk to rebound simultaneously. If microbatch suffices, it’s safest and cheapest - the practical conclusion.

When told “real-time,” break it down by numbers first. 5-min and 100ms delays are different worlds.

Phased decision matrix for freshness x streaming adoption

When told “real-time,” first break it down by numbers - that’s practice. The optimum differs per freshness requirement.

Freshness requirementAdopted techMonthly ops costRequired SREs
Daily (yesterday’s by morning)Daily batch (dbt)Thousands of yen0 (concurrent role)
Hours-delay OKHourly batchTens of thousands0 (concurrent role)
5-15 min delay OKMicrobatch (15-min dbt)Tens of thousands0
1 min to a few secLightweight streaming (Pub/Sub + Lambda)Hundreds of thousands1
Sub-100ms requiredSerious streaming (Kafka + Flink)Millions+2-3 dedicated

“The substantive lower bound for adopting streaming is 2+ dedicated SREs.” Adopt below that and the team melts under 24/7 incident response, window design, and Exactly-Once operations. The empirical rule: 90% of business requirements are fine with daily batch or microbatch. To “look real-time-ish,” microbatch is enough.

Real-time is 10x batch’s operational cost. Limit it to truly necessary scenes.

Streaming-operation pitfalls and forbidden moves

Here are the typical accidents in streaming. All of them lead directly to data loss, double processing, or full service stoppage.

Forbidden moveWhy it’s bad
Adopt streaming immediately on “customer wants real-time”Many actually OK with 30-min delay. Hear out requirements first
At-Least-Once without idempotencyNetwork failures cause double payments, double inventory deduction
Self-operate Kafka without dedicated SREZookeeper/KRaft management, partition design melt the team
Don’t distinguish event time and processing timeLate-arriving events break aggregation. Strict window design
Operate Kafka with JSON freedom (no schema)Consumers keep breaking. Protobuf/Avro + Schema Registry required
The moment you depend on a streaming platform, all business dependsSPOF like the November 2020 AWS Kinesis outage
Trust Exactly-Once entirelyMeaningless without idempotency on the consumer side. Both shield and spear needed
Operate without a DLQ (Dead Letter Queue)Failed messages retry forever, clog up
Set monitoring at batch granularityStreaming needs second-level monitoring. Constantly watch message lag and consumer lag
No prep for traffic surgesKafka partition shortage causes processing delay to snowball
Schema changes without forward compatibilityAll old-version consumers die. Follow Avro/Protobuf compatibility rules
Assuming “putting in Kafka solves problems”Processing engines, schema management, monitoring, and ops regime are also needed — heavy overall
Assuming “streaming means fast”Depending on design it can be slower than batch; wrong window/state/retry choices yield no performance

The November 25, 2020 large-scale AWS Kinesis outage was an event where us-east-1’s Kinesis Data Streams stopped for hours, with cascading impact to many AWS services like CloudWatch, Cognito, and SQS. A lesson showing that the moment you depend on a streaming platform, its outage stops all your business.

For serious streaming, 2-3 dedicated SREs and Schema Registry are the minimum requirements.

AI decision axes

AI-era favorableAI-era unfavorable
Managed (Kinesis, Pub/Sub)Self-built Kafka (operations hard for AI to learn)
Schema-driven (Avro, Protobuf)JSON freedom
Event-driven (documented)Implicit timing dependencies
SQL-centric (ksqlDB, Flink SQL)Custom DSL (Domain-Specific Language)
  1. Question whether real-time is truly needed — if a few minutes’ delay is OK, substitute with microbatch.
  2. Managed first — lower ops load with Kinesis/Pub-Sub; Kafka self-operation requires dedicated SREs.
  3. Make schemas explicit — Protobuf/Avro + Schema Registry; no JSON freedom.
  4. Design consumers idempotent — Exactly-Once is the shield, idempotency is the spear; both together.

Managed streaming has abundant AI training data

Kinesis Data Streams and Cloud Pub/Sub have rich official documentation and sample code, so AI can accurately generate configuration code (Terraform) and Producer/Consumer code. Self-operated Kafka clusters have many project-specific settings, and there are cases where AI’s general knowledge alone can’t handle them accurately.

Schema-driven event design raises AI generation accuracy

When event schemas are registered in a Schema Registry with Avro/Protobuf, AI can accurately grasp “which fields are available in this event” and generate Consumer code. With schema-less free-form JSON, the event structure must be taught to AI each time.

What to decide - what is your project’s answer?

For each of the following, try to articulate your project’s answer in 1-2 sentences. Starting work with these vague always invites later questions like “why did we decide this again?”

  • Is real-time truly needed (re-confirm requirements)
  • Message queue (Kafka / Kinesis / Pub-Sub)
  • Processing engine (Flink / ksqlDB / Spark Streaming / not needed)
  • Guarantee level (At-Least-Once / Exactly-Once)
  • Schema management (Avro/Protobuf + Schema Registry)
  • Window design (time types, delay tolerance)
  • Monitoring/alerting (SLO, metrics, failure notifications)

https://en.senkohome.com/arch-intro-data-datastore/ https://en.senkohome.com/arch-intro-data-etl/ https://en.senkohome.com/arch-intro-data-governance/

Summary

This article covered streaming, including selection of Kafka/Kinesis/Pub-Sub/Flink/ksqlDB, Exactly-Once and window processing, the freshness x operational-cost matrix, and judgment axes for avoiding over-investment in real-time.

Question whether real-time is truly needed, prioritize managed services, make schemas explicit, and design consumers idempotent. That is the practical answer for streaming in 2026.

Next time we’ll cover data governance (master management, catalog, regulatory compliance).

Back to series TOC -> ‘Architecture Crash Course for the Generative-AI Era’: How to Read This Book

I hope you’ll read the next article as well.