Data streaming: what is the difference between the tumbling and sliding window?

When you start processing streams of events, there always comes a time to decide on how to group them. We have a few kinds of window functions that we can use for such a grouping.

First, I have to start by stating the obvious thing. All window operations output the result at the end of the window. Of course, they do, it is not possible to have a five second-long sliding window that sees the events in the future.

When we think about the issue of grouping events for a while, we understand that it is the only possible option, but some people forget about it during job interviews. Hopefully, you will remember when you hear a tricky question about window functions.

The other trap is the fact that some projects misuse the names of window functions. For example, the function that Apache Flink calls a sliding window is described as a hopping window in Azure Stream Analytics.

Window functions

Now, let’s move to the actual topic. I’m going to begin with the most popular type of window function - the sliding window. There are two options; we can either have a time-based sliding window or an eviction-based sliding window.

If the tool you use implements the sliding window as an actual sliding window, you will never get an empty set as the output. Of course, if the “sliding window” you are using is, in fact, a hopping window in disguise, empty results may occur.

The time-based sliding window gives us the events that happened during the last t-seconds. Let’s look at an example. We have a ten seconds-long stream of events which we group into five second-long sliding windows.

Eviction-based sliding windows always contains n elements. For example, when I apply the sliding window function to get five-element slices of the previous events, I am going to get the following result:

The third kind of window function is a hopping window. As the name suggests, it is the window function that “jumps.” Because of that, we must specify the length of the window and the length of the jump. It does not need to be the same number!

For example, I can specify the 2.5 second-long jumps and five-second long window:

The last kind of window function is the tumbling window. It is a hopping function with equal “jump” and length. In this case, my example events get grouped into only two windows:

Did you enjoy reading this article?
Would you like to learn more about software craft in data engineering and MLOps?

Subscribe to the newsletter or add this blog to your RSS reader (does anyone still use them?) to get a notification when I publish a new essay!

Newsletter

Do you enjoy reading my articles?
Subscribe to the newsletter if you don't want to miss the new content, business offers, and free training materials.

Bartosz Mikulski

Bartosz Mikulski

  • Data/MLOps engineer by day
  • DevRel/copywriter by night
  • Python and data engineering trainer
  • Conference speaker
  • Contributed a chapter to the book "97 Things Every Data Engineer Should Know"
  • Twitter: @mikulskibartosz
Newsletter

Do you enjoy reading my articles?
Subscribe to the newsletter if you don't want to miss the new content, business offers, and free training materials.