Last modified: January 24, 2026

This article is written in: 🇺🇸

Stream Processing

Stream processing involves ingesting, analyzing, and taking action on data as it is produced. This near-real-time or real-time methodology is helpful for applications that need to respond quickly to continuously updating information, such as IoT sensor readings, financial transactions, or social media data. By processing events or records as they arrive, systems can quickly generate insights, trigger alerts, or update dashboards without waiting for a complete batch.

ASCII DIAGRAM: Stream Processing Overview

+---------------+      +---------------------------------+      +--------------+
|               |      | Stream Processing               |      |              |
| Data Sources  +----->+    (Real-time/Near-real-time)   +----->+ Final Output |
| (Sensors, etc.)      |   +-----+   +-----+   +-----+   |      | (Dashboard, |
+---------------+      |   | P1  |   | P2  |   | P3  |   |      |  Alerts, etc.) 
                       +----+-----+---+-----+---+-----+---+      +--------------+

Message Brokers

Message brokers provide asynchronous communication between producing and consuming services, often serving as a backbone for stream processing pipelines.

ASCII DIAGRAM: Message Broker in the Pipeline

  Publisher(s)             Message Broker           Consumer(s)
+------------+     (Push)     +-----------+     (Pull)  +------------+
|  Service A |  ---------->   |   Topic   |  <----------|  Service B |
+------------+               +-----+-----+             +------------+
                                     \
                                      +------------+
                                      | Service C  |
                                      +------------+

Log-Based Message Brokers

Log-based brokers (like Apache Kafka) store messages in an immutable, append-only log. Consumers read from an offset that tracks their position, allowing for replaying messages or consuming them at different rates.

ASCII DIAGRAM: Log-Based Broker (Conceptual)

               +---------+
Producer --->  | Partition 0 | ---> Consumer Group 1
               +---------+
Producer --->  | Partition 1 | ---> Consumer Group 2
               +---------+
Producer --->  | Partition 2 | ---> Consumer Group 2
               +---------+

Change Data Capture (CDC)

CDC captures and streams row-level database changes in real time. By converting inserts, updates, and deletes into event streams, CDC simplifies synchronizing data across systems.

ASCII DIAGRAM: CDC Flow

   Database            CDC Engine                Message Broker               Consumers
+------------+   +---------------+           +---------------+          +-----------------+
|  Table(s)  |-->| Binlog Reader | --------> |   Topic(s)    |  ----->  | Microservices  |
+------------+   +---------------+           +-------+-------+          +-----------------+
                                                    |
                                                    v
                                             +---------------+
                                             |   Analytics   |
                                             +---------------+

This approach is especially popular for maintaining read replicas, caches, or downstream analytics systems in near real time.

Event Sourcing

Event Sourcing maintains an append-only log of all state changes in an application, rather than storing only the latest state. By replaying these events, the application’s state can be reconstructed at any point in time, providing excellent auditability and historical insight.

Streams and Time

Time management in distributed stream systems can be complex due to network delays and clock drift. Common windowing strategies address how events are grouped over time:

  1. Hopping Windows
  2. Fixed-size windows, with a “hop” period that is larger than the window size, leaving gaps in coverage.
  3. Example: Window of size 6 minutes, but a new window starts every 10 minutes, leading to partial coverage.

  4. Sliding Windows

  5. Overlapping windows, each capturing a fixed duration but starting at smaller intervals.
  6. Example: A 6-minute window slides every 2 minutes, allowing continuous coverage of data.

  7. Tumbling Windows

  8. Consecutive, non-overlapping windows of a fixed size.
  9. Example: Every 6 minutes forms a new batch of events with no overlap or gap.
ASCII DIAGRAM: Time Window Examples (Conceptual)

Hopping (Size=6, Hop=10)
[0---6]             [10--16]            [20--26]  

Sliding (Size=6, Slide=2)
[0---6]
   [2---8]
      [4---10]
         [6---12]  

Tumbling (Size=6)
[0---6][6---12][12--18][18--24]

Stream Joins

Joining data in a streaming environment allows correlation of events from multiple sources or combining real-time events with reference data.

ASCII DIAGRAM: Stream-Stream Join (Conceptual)

Stream A (Orders)         Join on OrderID          Stream B (Payments)
     (A1)  ------------------>   +--------------+   <---------------- (B1)
     (A2)  ------------------>   |  Join State  |   <---------------- (B2)
     (A3)  ------------------>   +--------------+   <---------------- (B3)

Intermediate state must be maintained to match events from each stream within a specified time window.

Fault Tolerance

Ensuring a streaming system remains reliable in the face of failures is critical:

ASCII DIAGRAM: Checkpointing Example

        Stream
          |
          v
+---------+---------+
| Stream Processing |
+---------+---------+
          |
          | (Periodic Save State)
          v
+------------------+
|  Checkpoint Store|
+------------------+

When a node fails, it can recover from the latest checkpoint, replaying only the events since that point.

Key Stream Processing Technologies

When to Use Stream Processing