Data streaming: what is the difference between the tumbling and sliding window?

When you start processing streams of events, there always comes a time to decide on how to group them. We have a few kinds of window functions that we can use for such a grouping.

First, I have to start by stating the obvious thing. All window operations output the result at the end of the window. Of course, they do, it is not possible to have a five second-long sliding window that sees the events in the future.

When we think about the issue of grouping events for a while, we understand that it is the only possible option, but some people forget about it during job interviews. Hopefully, you will remember when you hear a tricky question about window functions.

The other trap is the fact that some projects misuse the names of window functions. For example, the function that Apache Flink calls a sliding window is described as a hopping window in Azure Stream Analytics.

Subscribe to the newsletter and join the free email course.

Window functions

Now, let’s move to the actual topic. I’m going to begin with the most popular type of window function - the sliding window. There are two options; we can either have a time-based sliding window or an eviction-based sliding window.

If the tool you use implements the sliding window as an actual sliding window, you will never get an empty set as the output. Of course, if the “sliding window” you are using is, in fact, a hopping window in disguise, empty results may occur.

The time-based sliding window gives us the events that happened during the last t-seconds. Let’s look at an example. We have a ten seconds-long stream of events which we group into five second-long sliding windows.

Eviction-based sliding windows always contains n elements. For example, when I apply the sliding window function to get five-element slices of the previous events, I am going to get the following result:

The third kind of window function is a hopping window. As the name suggests, it is the window function that “jumps.” Because of that, we must specify the length of the window and the length of the jump. It does not need to be the same number!

For example, I can specify the 2.5 second-long jumps and five-second long window:

The last kind of window function is the tumbling window. It is a hopping function with equal “jump” and length. In this case, my example events get grouped into only two windows:

Remember to share on social media!
If you like this text, please share it on Facebook/Twitter/LinkedIn/Reddit or other social media.

If you want to contact me, send me a message on LinkedIn or Twitter.

Bartosz Mikulski
Bartosz Mikulski * MLOps Engineer / data engineer * conference speaker * co-founder of Software Craft Poznan & Poznan Scala User Group

Subscribe to the newsletter and get access to my free email course on building trustworthy data pipelines.