How to run Airflow in Docker (with a persistent database)

In this blog post, I am going to show you how to prepare the minimalist setup of puckel/docker-airflow Docker image that will run a single DAG and store logs persistently (so we will not lose it during restarts of Docker container).

Firstly, we have to pull the Docker image and prepare the minimalist DAG configuration. The simplest DAG consists of a single DummyOperator. The content of the file is listed below:

1
2
3
4
5
6
7
8
9
from datetime import datetime
from airflow import DAG
from airflow.operators.dummy_operator import DummyOperator

dag = DAG('minimalist_dag', description='The simplest DAG',
          schedule_interval='0 12 * * *',
          start_date=datetime(2019, 7, 10))

dummy_operator = DummyOperator(task_id='dummy_task', retries=1, dag=dag)

I am going to save the code in minimalist.py file in the /home/user/airflow/dags directory (you will need the full path to the directory where you saved the file).

To pass the DAG configuration to the Airflow instance we need to map the local directory to a directory in a Docker container using the volume configuration, so we have to add those parameters to docker run parameters:

1
-v /home/user/airflow/dags:/usr/local/airflow/dags

By default, the log database is in the /usr/local/airflow/airflow.db file. I cannot map the /usr/local/airflow directory to a local directory, because that would break the configuration (the whole configuration is there, and I don’t want to override it).

To deal with that problem, I have to change the location of the database file. That can be done using an environment variable: AIRFLOW__CORE__SQL_ALCHEMY_CONN:

I am going to change it to sqlite:////usr/local/airflow/db/airflow.db (note that it has to be the full SQL connection string), using this parameter:

1
-e AIRFLOW__CORE__SQL_ALCHEMY_CONN=sqlite:////usr/local/airflow/db/airflow.db

After that change, I have to map the directory I choose to another local directory.

1
-v /home/user/airflow/db:/usr/local/airflow/db

The full command that runs Airflow in Docker with custom dags and persisted log is here:

1
docker run -d -p 8080:8080 -e AIRFLOW__CORE__SQL_ALCHEMY_CONN=sqlite:////usr/local/airflow/db/airflow.db -v /home/user/airflow/dags:/usr/local/airflow/dags -v /home/user/airflow/db:/usr/local/airflow/db puckel/docker-airflow webserver

Remember to change this part: /home/user/airflow/ to the full path of your local directory (in both parameters).

Did you enjoy reading this article?
Would you like to learn more about leveraging AI to drive growth and innovation, software craft in data engineering, and MLOps?

Subscribe to the newsletter or add this blog to your RSS reader (does anyone still use them?) to get a notification when I publish a new essay!

Newsletter

Do you enjoy reading my articles?
Subscribe to the newsletter if you don't want to miss the new content, business offers, and free training materials.

Bartosz Mikulski

Bartosz Mikulski

  • MLOps engineer by day
  • AI and data engineering consultant by night
  • Python and data engineering trainer
  • Conference speaker
  • Contributed a chapter to the book "97 Things Every Data Engineer Should Know"
  • Twitter: @mikulskibartosz
  • Mastodon: @mikulskibartosz@mathstodon.xyz
Newsletter

Do you enjoy reading my articles?
Subscribe to the newsletter if you don't want to miss the new content, business offers, and free training materials.