How to find the Hive partition closest to a given date

This article is a part of my "100 data engineering tutorials in 100 days" challenge. (71/100)

In Airflow, there is a built-in function, which we can use to find the Hive partition closest to the given date. However, it works only with partition identifiers in the YYYY-mm-dd format, so if you use a different partitioning method, this function will not help you.

To find the closest Hive partition, we should use the closest_ds_partition function:

from airflow.macros.hive import closest_ds_partition


Be careful with the before parameter. It has a weird behavior. As you may expect, True means a partition before the given date, False returns the partition after a given date, but when the before parameter is set to None it returns the closest partition, and it does not matter whether it is before or after the given date.

Please don’t follow this coding practice. Three value “boolean” logic is a terrible, terrible idea. It is way better to use an enum with descriptive names.

Would you like to help fight youth unemployment while getting mentoring experience?

Develhope is looking for tutors (part-time, freelancers) for their upcoming Data Engineer Courses.

The role of a tutor is to be the point of contact for students, guiding them throughout the 6-month learning program. The mentor supports learners through 1:1 meetings, giving feedback on assignments, and responding to messages in Discord channels—no live teaching sessions.

Expected availability: 15h/week. You can schedule the 1:1 sessions whenever you want, but the sessions must happen between 9 - 18 (9 am - 6 pm) CEST Monday-Friday.

Check out their job description.

(free advertisement, no affiliate links)

Remember to share on social media!
If you like this text, please share it on Facebook/Twitter/LinkedIn/Reddit or other social media.

If you want to contact me, send me a message on LinkedIn or Twitter.

Bartosz Mikulski
Bartosz Mikulski * MLOps Engineer / data engineer * conference speaker * co-founder of Software Craft Poznan & Poznan Scala User Group

Subscribe to the newsletter and get access to my free email course on building trustworthy data pipelines.