How to assign rows to ranked groups in AWS Athena

This article is a part of my "100 data engineering tutorials in 100 days" challenge. (81/100)

A ranked group is a method of grouping rows in the following way:

  • First, we have to order the rows by a column.
  • After that, we determine how many rows should be in every group. All of the groups (except the last one) must have the same number of rows.
  • In the end, we traverse the ordered rows and assign them to groups one by one.

For example, let’s assume that I have numbers 3, 5, 1, 8, 2, 4, 7, 9, 0, and I want to assign them to three groups.

In the first step, I have to order them 0, 1, 2, 3, 4, 5, 7, 8, 9. I calculate the size of the group. I have nine numbers, and I want three groups, so every group contains three numbers. Finally, I assign the numbers to groups:

  • Group I: 0, 1, 2
  • Group II, 3, 4, 5
  • Group III: 7, 8, 9

How do we do that in Athena? Fortunately, there is a built-in function NTILE:

1
2
3
SELECT t.*,,
       NTILE(3) OVER (ORDER BY col)
FROM table t;

Subscribe to the newsletter and join the free email course.


Remember to share on social media!
If you like this text, please share it on Facebook/Twitter/LinkedIn/Reddit or other social media.

If you want to contact me, send me a message on LinkedIn or Twitter.

Would you like to have a call and talk? Please schedule a meeting using this link.


Bartosz Mikulski
Bartosz Mikulski * data/machine learning engineer * conference speaker * co-founder of Software Craft Poznan & Poznan Scala User Group

Subscribe to the newsletter and get access to my free email course on building trustworthy data pipelines.

Do you want to work with me at riskmethods?

REMOTE position (available in Poland or Germany)