Windowing in Google Cloud Dataflow (Fixed, Sliding, Session)
Learn basics of windowing concepts in dataflow with example data and visualization
Fixed Window
Windows of fixed interval duration, uniform across all the keys, no overlaps between two consecutive widows
Use cases — any aggregation use cases, any batch analysis of data, relatively simple use cases
Sliding Window
Windows of fixed interval duration, uniform across all the keys, overlap between two windows (same element can be present in multiple windows)
Use cases — Moving averages of data
Session Window
Windows of dynamically set intervals, non-uniform across keys (different windows for different keys, different window sizes for each key), no overlap between two windows
Use cases — user session data, click data, real time gaming data analysis
Sample Data
CPU utilization percentages of 3 different servers across a span of 15 sec
Code
Fixed windows of 5 sec
Sliding windows of 5 sec duration and 4 sec period (frequency with which window begins)
Session windows of 5 sec gap (If gap between two elements for a key is more than 5 sec, the current window closes and new window starts)
Dataflow output
Visualize the dataflow PCollection elements grouped into windows for all the three windows
Elements are denoted as circles across 20 sec timeline, all the elements which have same key (serverID) are marked the same color and the number inside the circle marks the measure (CPU utilization value)
For each element check the start and closing timestamps of the window it belongs to, the key (serverID), the element timestamp and CPU utilization value
Hope this helped for all the dataflow enthusiasts :)