How to retrieve the statuses of the recent DAG executions from Airflow database
If you want to create a nice dashboard that displays the statuses of the most recent Airflow DAG runs, you will need to retrieve them from the Airflow database:
1
2
3
4
5
6
7
8
9
10
11
12
13
SELECT
dag_run.dag_id,
dag_run.state
FROM dag_run
INNER JOIN (SELECT
dag_id,
MAX(execution_date) AS date
FROM dag_run
GROUP BY dag_id) mx
ON
dag_run.dag_id = mx.dag_id
AND dag_run.execution_date = mx.date
JOIN dag ON dag.dag_id = dag_run.dag_id AND is_active = 1 AND is_paused = 0
This query retrieves the most recent DAG run and returns its id and state. Additionally, it filters out the paused DAGs, so we don’t pollute the results with things that don’t run in production right now.
Of course, creating a dashboard of Airflow DAG runs takes some time, so you may prefer to use the project we have open-sourced: https://github.com/Wikia/discreETLy. In addition to DAG execution times and statuses, it also displays Athena tables’ last update time and their descriptions.
Parsing machine learning logs with Ahana, a managed Presto service, and Cube, a headless BI solution

Check out my article published on the Cube.dev blog!
You may also like
- How to postpone Airflow DAG until files get uploaded into an S3 bucket
- How to use AWSAthenaOperator in Airflow to verify that a DAG finished successfully
- Remove a directory from S3 using Airflow S3Hook
- Anomaly detection in Airflow DAG using Prophet library
- How to prevent Airflow from backfilling old DAG runs
Bartosz Mikulski
- Data/MLOps engineer by day
- DevRel/copywriter by night
- Python and data engineering trainer
- Conference speaker
- Contributed a chapter to the book "97 Things Every Data Engineer Should Know"
- Twitter: @mikulskibartosz