Row number in Apache Spark window — row_number, rank, and dense_rank
This article is mostly a “note to self” because I don’t want to google that anymore ;)
Which function should we use to rank the rows within a window in Apache Spark data frame?
It depends on the expected output. row_number is going to sort the output by the column specified in orderBy function and return the index of the row (human-readable, so starts from 1).
The only difference between rank and dense_rank is the fact that the rank function is going to skip the numbers if there are duplicates assigned to the same rank. In the same situation, the dense_rank function uses the next number in a sequence.
I found this great example on StackOverflow that seems to explain everything:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
import org.apache.spark.sql.expressions.Window
import org.apache.spark.sql.functions._
val df = Seq(("a", 10), ("a", 10), ("a", 20)).toDF("col1", "col2")
val windowSpec = Window.partitionBy("col1").orderBy("col2")
df
.withColumn("rank", rank().over(windowSpec))
.withColumn("dense_rank", dense_rank().over(windowSpec))
.withColumn("row_number", row_number().over(windowSpec)).show
+----+----+----+----------+----------+
|col1|col2|rank|dense_rank|row_number|
+----+----+----+----------+----------+
| a| 10| 1| 1| 1|
| a| 10| 1| 1| 2|
| a| 20| 3| 2| 3|
+----+----+----+----------+----------+
Source: https://stackoverflow.com/questions/44968912/difference-in-dense-rank-and-row-number-in-spark
Did you enjoy reading this article?
Would you like to learn more about leveraging AI to drive growth and innovation, software craft in data engineering, and MLOps?
Subscribe to the newsletter or add this blog to your RSS reader (does anyone still use them?) to get a notification when I publish a new essay!
You may also like

Bartosz Mikulski
- MLOps engineer by day
- AI and data engineering consultant by night
- Python and data engineering trainer
- Conference speaker
- Contributed a chapter to the book "97 Things Every Data Engineer Should Know"
- Twitter: @mikulskibartosz
- Mastodon: @mikulskibartosz@mathstodon.xyz