Windowing in Google Cloud Dataflow (Fixed, Sliding, Session)

Pavan Kumar Kattamuri
2 min readAug 1, 2020

--

Learn basics of windowing concepts in dataflow with example data and visualization

Fixed Window

Windows of fixed interval duration, uniform across all the keys, no overlaps between two consecutive widows

Use cases — any aggregation use cases, any batch analysis of data, relatively simple use cases

Sliding Window

Windows of fixed interval duration, uniform across all the keys, overlap between two windows (same element can be present in multiple windows)

Use cases — Moving averages of data

Session Window

Windows of dynamically set intervals, non-uniform across keys (different windows for different keys, different window sizes for each key), no overlap between two windows

Use cases — user session data, click data, real time gaming data analysis

Sample Data

CPU utilization percentages of 3 different servers across a span of 15 sec

Code

Fixed windows of 5 sec

Sliding windows of 5 sec duration and 4 sec period (frequency with which window begins)

Session windows of 5 sec gap (If gap between two elements for a key is more than 5 sec, the current window closes and new window starts)

Dataflow output

Visualize the dataflow PCollection elements grouped into windows for all the three windows

Elements are denoted as circles across 20 sec timeline, all the elements which have same key (serverID) are marked the same color and the number inside the circle marks the measure (CPU utilization value)

For each element check the start and closing timestamps of the window it belongs to, the key (serverID), the element timestamp and CPU utilization value

Hope this helped for all the dataflow enthusiasts :)

--

--

Pavan Kumar Kattamuri
Pavan Kumar Kattamuri

Written by Pavan Kumar Kattamuri

Platform Engineer | Cloud | GCP | AWS

Responses (2)