Use the ROW_NUMBER() function to get top rows by partition in Hive

Architectural Deep Dives: Designing deterministic RAG pipelines and Agentic workflows using schema enforcement (BAML).
Production MLOps: Real-world strategies for evaluation, FinOps, and "Shift Left" data quality.
System Reliability: Post-mortems on why AI systems fail at scale and how to prevent it.

Imagine that I have a SQL query that calculates the number of rows grouped by some column:

SELECT column_A, count(*) as cnt
FROM table_name
GROUP BY column_A

I want to get the top 10 rows in every group. To do this, I have to use the query above as a subquery and use the ROW_NUMBER() function and a window function. The window function will group the rows by the column_A and order them by the number of rows:

SELECT t.*, ROW_NUMBER() OVER (PARTITION BY column_A ORDER BY cnt DESC) as rank FROM (
    SELECT column_A, count(*) as cnt
    FROM table_name
    GROUP BY column_A
) t

To get only the top 10 rows, I am going to put all of that in another subquery and use WHERE to select only top values:

SELECT t2.column_A as column_A, t2.cnt as cnt FROM (
    SELECT t.*, ROW_NUMBER() OVER (PARTITION BY column_A ORDER BY cnt DESC) as rank FROM (
        SELECT column_A, count(*) as cnt
        FROM table_name
        GROUP BY column_A
        ) t
    ) t2
WHERE t2.rank <= 10;