How to prevent Airflow from backfilling old DAG runs

This article is a part of my "100 data engineering tutorials in 100 days" challenge. (69/100)

Sometimes it makes no sense to backfill old DAG runs. For example, when we retrieve data from a REST API, which always returns the current state, when we use Airflow to send a newsletter, or when the DAG run computes the entire data history every time it runs, so the execution date does not matter.

In Airflow, there are two ways to prevent the DAG from backfilling old runs.

We can set the catchup parameter of a DAG to False. In this case, Airflow will never create DAG runs with the execution date in the past.

dag = DAG('example_dag',
        ... # other parameters

The second method is to include the LatestOnlyOperator operator inside the DAG. This operator stops DAG execution if the current run is not the latest one. This approach is useful when we want to backfill only some of the tasks and skip others. To understand how to use the LatestOnlyOperator, take a look at this blog post.

Subscribe to the newsletter and join the free email course.

Remember to share on social media!
If you like this text, please share it on Facebook/Twitter/LinkedIn/Reddit or other social media.

If you want to contact me, send me a message on LinkedIn or Twitter.

Bartosz Mikulski
Bartosz Mikulski * MLOps Engineer / data engineer * conference speaker * co-founder of Software Craft Poznan & Poznan Scala User Group

Subscribe to the newsletter and get access to my free email course on building trustworthy data pipelines.