Making OFFSET LIMIT queries in AWS Athena
This article is a part of my "100 data engineering tutorials in 100 days" challenge. (68/100)
There is no OFFSET support in AWS Athena, but we can use a workaround to get the same behavior.
Let’s assume that I want to execute this query:
1
2
3
SELECT * FROM some_table
ORDER BY column_A
OFFSET 20 LIMIT 10 -- this doesn't work
to get the desired outcome, I need three things:
- a window function that assigns a number to every row
- ordering method that sorts the table, so I get a deterministic outcome
- filtering function that selects only the rows I want
1
2
3
SELECT * FROM (
SELECT row_number() over(ORDER BY column_A) AS rn, * FROM some_table)
WHERE rn BETWEEN 20 AND 30; -- 30 = 20 (offset) + 10 (limit)
You may also like
Remember to share on social media! If you like this text, please share it on Facebook/Twitter/LinkedIn/Reddit or other social media.
If you want to contact me, send me a message on LinkedIn or Twitter.
Would you like to have a call and talk? Please schedule a meeting using this link.