How Airflow scheduler works

Do you think that Airflow scheduler is unintuitive? Have you tried to use intervals and the only method that worked was “trial and error?” Does it look like the only way to set it up correctly is tweaking the settings until somehow you get it right? If yes, this is a blog post for you ;)

I am going to explain the Airflow scheduler and maybe, just maybe, help you understand how it works.

Example

Imagine that Airflow is a person with a timer. Usually, we use Airflow to move data around, and recently I published summaries of the Data Janitor talks, so let’s assume that Airflow is a janitor.

We set the start_date to 9 am today and the interval to “@hourly”. It means that we have told Airflow to start working at 9 am and move some boxes at the beginning of every hour.

Our janitor comes to work at 9 am as expected. Do you think that he/she starts working right away? Obviously no. First, the coffee has to be made. After that, there is a time for watching or reading the news. No work is done between 9 am and 10 am.

What happens at 10 am? The “hourly” interval has passed, and now it is time to move some data around. The same happens at 11 am and later — every hour. At this, point my example breaks down because the real janitor goes home at 5 pm, but Airflow is going to continue working at repeat the same job every hour. Continuously, even at night.

Start date

The problem is that “start_date” parameter is counterintuitive. The job does not start at this time. So what starts?

In Airflow start_date is the time when Airflow starts the timer. It just starts measuring the time.

Did you enjoy reading this article?
Would you like to learn more about software craft in data engineering and MLOps?

Subscribe to the newsletter or add this blog to your RSS reader (does anyone still use them?) to get a notification when I publish a new essay!

Newsletter

Do you enjoy reading my articles?
Subscribe to the newsletter if you don't want to miss the new content, business offers, and free training materials.

Bartosz Mikulski

Bartosz Mikulski

  • Data/MLOps engineer by day
  • DevRel/copywriter by night
  • Python and data engineering trainer
  • Conference speaker
  • Contributed a chapter to the book "97 Things Every Data Engineer Should Know"
  • Twitter: @mikulskibartosz
Newsletter

Do you enjoy reading my articles?
Subscribe to the newsletter if you don't want to miss the new content, business offers, and free training materials.