How to assign rows to ranked groups in AWS Athena

This article is a part of my "100 data engineering tutorials in 100 days" challenge. (81/100)

A ranked group is a method of grouping rows in the following way:

  • First, we have to order the rows by a column.
  • After that, we determine how many rows should be in every group. All of the groups (except the last one) must have the same number of rows.
  • In the end, we traverse the ordered rows and assign them to groups one by one.

For example, let’s assume that I have numbers 3, 5, 1, 8, 2, 4, 7, 9, 0, and I want to assign them to three groups.

In the first step, I have to order them 0, 1, 2, 3, 4, 5, 7, 8, 9. I calculate the size of the group. I have nine numbers, and I want three groups, so every group contains three numbers. Finally, I assign the numbers to groups:

  • Group I: 0, 1, 2
  • Group II, 3, 4, 5
  • Group III: 7, 8, 9

How do we do that in Athena? Fortunately, there is a built-in function NTILE:

1
2
3
SELECT t.*,,
       NTILE(3) OVER (ORDER BY col)
FROM table t;

Subscribe to the newsletter and join the free email course.


Remember to share on social media!
If you like this text, please share it on Facebook/Twitter/LinkedIn/Reddit or other social media.

If you want to contact me, send me a message on LinkedIn or Twitter.

Would you like to have a call and talk? Please schedule a meeting using this link.


Bartosz Mikulski
Bartosz Mikulski * MLOps Engineer / data engineer * conference speaker * co-founder of Software Craft Poznan & Poznan Scala User Group

Subscribe to the newsletter and get access to my free email course on building trustworthy data pipelines.