How to retrieve the statuses of the recent DAG executions from Airflow database
If you want to create a nice dashboard that displays the statuses of the most recent Airflow DAG runs, you will need to retrieve them from the Airflow database:
1 2 3 4 5 6 7 8 9 10 11 12 13 SELECT dag_run.dag_id, dag_run.state FROM dag_run INNER JOIN (SELECT dag_id, MAX(execution_date) AS date FROM dag_run GROUP BY dag_id) mx ON dag_run.dag_id = mx.dag_id AND dag_run.execution_date = mx.date JOIN dag ON dag.dag_id = dag_run.dag_id AND is_active = 1 AND is_paused = 0
This query retrieves the most recent DAG run and returns its id and state. Additionally, it filters out the paused DAGs, so we don’t pollute the results with things that don’t run in production right now.
Of course, creating a dashboard of Airflow DAG runs takes some time, so you may prefer to use the project we have open-sourced: https://github.com/Wikia/discreETLy. In addition to DAG execution times and statuses, it also displays Athena tables’ last update time and their descriptions.
You may also like
- How to delay an Airflow DAG until a given hour using the DateTimeSensor
- Why my Airflow tasks got stuck in "no_status" and how I fixed it
- How to run PySpark code using the Airflow SSHOperator
- How to conditionally skip tasks in an Airflow DAG
- How to postpone Airflow DAG until files get uploaded into an S3 bucket