How to check whether a regular expression matches a string in Hive

This article is a part of my "100 data engineering tutorials in 100 days" challenge. (78/100)

The other day, I was porting an Athena query to run it on Hive, and I realized that the regexp_like function does not exist in Hive. Fortunately, there is a nice replacement that offers precisely the same behavior.

First, lets take a look at the regexp_like function in Presto/Athena:

regexp_like(column_name, 'regexp')

For example, we can use it like this to return the value of another column when a value in the row matches the regexp and return 0 when it doesn’t:

CASE WHEN (regexp_like(column_name, '^[0-9]+$')) THEN cast(other_column as double) ELSE 0 END

To get the same behavior in Hive, I have to use the RLIKE keyword. Instead of WHEN CASE with only two branches, I can use the IF function:

IF(column_name RLIKE '^[0-9]+$', cast (other_column as double), 0)

Subscribe to the newsletter and join the free email course.

Remember to share on social media!
If you like this text, please share it on Facebook/Twitter/LinkedIn/Reddit or other social media.

If you want to contact me, send me a message on LinkedIn or Twitter.

Would you like to have a call and talk? Please schedule a meeting using this link.

Bartosz Mikulski
Bartosz Mikulski * data/machine learning engineer * conference speaker * co-founder of Software Craft Poznan & Poznan Scala User Group

Subscribe to the newsletter and get access to my free email course on building trustworthy data pipelines.

Do you want to work with me at riskmethods?

REMOTE position (available in Poland or Germany)