How to find the Hive partition closest to a given date

This article is a part of my "100 data engineering tutorials in 100 days" challenge. (71/100)

In Airflow, there is a built-in function, which we can use to find the Hive partition closest to the given date. However, it works only with partition identifiers in the YYYY-mm-dd format, so if you use a different partitioning method, this function will not help you.

To find the closest Hive partition, we should use the closest_ds_partition function:

1
2
3
4
5
6
7
8
9
from airflow.macros.hive import closest_ds_partition

closest_ds_partition(
    hive_table_name,
    the_date,
    before=True,
    schema='hive_schema',
    metastore_conn_id='metastore_connection_id'
)

Be careful with the before parameter. It has a weird behavior. As you may expect, True means a partition before the given date, False returns the partition after a given date, but when the before parameter is set to None it returns the closest partition, and it does not matter whether it is before or after the given date.

Please don’t follow this coding practice. Three value “boolean” logic is a terrible, terrible idea. It is way better to use an enum with descriptive names.

Did you enjoy reading this article?
Would you like to learn more about software craft in data engineering and MLOps?

Subscribe to the newsletter or add this blog to your RSS reader (does anyone still use them?) to get a notification when I publish a new essay!

Newsletter

Do you enjoy reading my articles?
Subscribe to the newsletter if you don't want to miss the new content, business offers, and free training materials.

Bartosz Mikulski

Bartosz Mikulski

  • Data/MLOps engineer by day
  • DevRel/copywriter by night
  • Python and data engineering trainer
  • Conference speaker
  • Contributed a chapter to the book "97 Things Every Data Engineer Should Know"
  • Twitter: @mikulskibartosz
Newsletter

Do you enjoy reading my articles?
Subscribe to the newsletter if you don't want to miss the new content, business offers, and free training materials.