How to check whether a regular expression matches a string in Hive

This article is a part of my "100 data engineering tutorials in 100 days" challenge. (78/100)

The other day, I was porting an Athena query to run it on Hive, and I realized that the regexp_like function does not exist in Hive. Fortunately, there is a nice replacement that offers precisely the same behavior.

First, lets take a look at the regexp_like function in Presto/Athena:

regexp_like(column_name, 'regexp')

For example, we can use it like this to return the value of another column when a value in the row matches the regexp and return 0 when it doesn’t:

CASE WHEN (regexp_like(column_name, '^[0-9]+$')) THEN cast(other_column as double) ELSE 0 END

To get the same behavior in Hive, I have to use the RLIKE keyword. Instead of WHEN CASE with only two branches, I can use the IF function:

IF(column_name RLIKE '^[0-9]+$', cast (other_column as double), 0)

Subscribe to the newsletter and join the free email course.

Remember to share on social media!
If you like this text, please share it on Facebook/Twitter/LinkedIn/Reddit or other social media.

If you want to contact me, send me a message on LinkedIn or Twitter.

Would you like to have a call and talk? Please schedule a meeting using this link.

Bartosz Mikulski
Bartosz Mikulski * MLOps Engineer / data engineer * conference speaker * co-founder of Software Craft Poznan & Poznan Scala User Group

Subscribe to the newsletter and get access to my free email course on building trustworthy data pipelines.