How to prevent Airflow from backfilling old DAG runs
Sometimes it makes no sense to backfill old DAG runs. For example, when we retrieve data from a REST API, which always returns the current state, when we use Airflow to send a newsletter, or when the DAG run computes the entire data history every time it runs, so the execution date does not matter.
In Airflow, there are two ways to prevent the DAG from backfilling old runs.
We can set the
catchup parameter of a DAG to False. In this case, Airflow will never create DAG runs with the execution date in the past.
1 2 3 4 dag = DAG('example_dag', catchup=False, ... # other parameters )
The second method is to include the
LatestOnlyOperator operator inside the DAG. This operator stops DAG execution if the current run is not the latest one. This approach is useful when we want to backfill only some of the tasks and skip others. To understand how to use the
LatestOnlyOperator, take a look at this blog post.
Did you enjoy reading this article?
Would you like to learn more about software craft in data engineering and MLOps?
Subscribe to the newsletter or add this blog to your RSS reader (does anyone still use them?) to get a notification when I publish a new essay!
You may also like
- Why my Airflow tasks got stuck in "no_status" and how I fixed it
- How to set a different retry delay for every task in an Airflow DAG
- How to add a manual step to an Airflow DAG using the JiraOperator
- What to do when Airflow BashOperator fails with TemplateNotFound error
- Use HttpSensor to pause an Airflow DAG until a website is available
- Data/MLOps engineer by day
- DevRel/copywriter by night
- Python and data engineering trainer
- Conference speaker
- Contributed a chapter to the book "97 Things Every Data Engineer Should Know"
- Twitter: @mikulskibartosz