How to assign rows to ranked groups in AWS Athena
A ranked group is a method of grouping rows in the following way:
- First, we have to order the rows by a column.
- After that, we determine how many rows should be in every group. All of the groups (except the last one) must have the same number of rows.
- In the end, we traverse the ordered rows and assign them to groups one by one.
For example, let’s assume that I have numbers 3, 5, 1, 8, 2, 4, 7, 9, 0, and I want to assign them to three groups.
In the first step, I have to order them 0, 1, 2, 3, 4, 5, 7, 8, 9. I calculate the size of the group. I have nine numbers, and I want three groups, so every group contains three numbers. Finally, I assign the numbers to groups:
- Group I: 0, 1, 2
- Group II, 3, 4, 5
- Group III: 7, 8, 9
How do we do that in Athena? Fortunately, there is a built-in function NTILE:
1 2 3 SELECT t.*,, NTILE(3) OVER (ORDER BY col) FROM table t;
Did you enjoy reading this article?
Would you like to learn more about leveraging AI to drive growth and innovation, software craft in data engineering, and MLOps?
Subscribe to the newsletter or add this blog to your RSS reader (does anyone still use them?) to get a notification when I publish a new essay!
You may also like
- MLOps engineer by day
- AI and data engineering consultant by night
- Python and data engineering trainer
- Conference speaker
- Contributed a chapter to the book "97 Things Every Data Engineer Should Know"
- Twitter: @mikulskibartosz
- Mastodon: @firstname.lastname@example.org