How to get the value by rank from a grouped Pandas dataframe

How to get the value by rank from a grouped Pandas dataframe

Let’s say that we have a data frame containing all purchases done by all our customers. We want to get the most recent purchase of every client. How do we do it?

We may use the grouping function to get the purchases of every customer and then get the most recent one from every group.

Imagine that the “group_name” column contains the identifier of the customer. The “used_for_sorting” column is our date of purchase, so we want the largest value in the group.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
import pandas as pd
data = pd.DataFrame([
    ['a', 1, 'A'],
    ['a', 2, 'B'],
    ['a', 3, 'C'],
    ['b', 5, 'D'],
    ['b', 6, 'E'],
    ['b', 7, 'F'],
    ['b', 8, 'G'],
    ['c', 10, 'H'],
    ['c', 11, 'I'],
    ['c', 12, 'J'],
    ['c', 13, 'K']
], columns = ['group_name', 'used_for_sorting', 'the_value'])

Get the first value from a group

In Pandas such a solution looks like that.
First, I have to sort the data frame by the “used_for_sorting” column.
I must do it before I start grouping because sorting of a grouped data frame is not supported and the groupby function does not sort the value within the groups, but it preserves the order of rows.

1
sorted_data_frame = data.sort_values(['used_for_sorting'], ascending=False)

Now, I can group the data frame by the customer identifier. In my case, the “group_name” is the customer identifier.

1
grouped_data_frame = sorted_data_frame.groupby('group_name')

After that, I must get the first value from every group. There are two options. I can either use the first function or the nth function with parameter 1. Obviously, I am going to choose the first function.

1
grouped_data_frame.first()

If you don’t want to use the group name as the index, remember to drop it.

All in one line, it looks like this:

1
2
3
4
5
data \
  .sort_values(['used_for_sorting'], ascending = False) \
  .groupby('group_name') \
  .first() \
  .reset_index()

The result:

1
2
3
4
     group_name  used_for_sorting the_value
0          a                 3         C
1          b                 8         G
2          c                13         K

Do you want to show your product/service to 25000 data science enthusiasts every month? I am looking for companies which would like to become a partner of this blog.

Are you interested? Is your employer interested? Here are the details of the offer.

Return the rank

What if instead of returning one row I want to get all of the rows with their rank?

In this case, I have to:

  • group the data frame by the group_name column

  • get the series (column) from the grouped data frame

  • calculate the rank of the series

  • add the rank as a new column in the original data frame

1
2
3
4
5
ranks = data \
  .groupby('group_name')['used_for_sorting'] \
  .rank(ascending = True, method = 'first')
ranks.name = 'rank'
pd.concat([data, ranks], axis = 1)

The result:

1
2
3
4
5
6
7
8
9
10
11
12
    group_name  used_for_sorting the_value  rank
0           a                 1         A   1.0
1           a                 2         B   2.0
2           a                 3         C   3.0
3           b                 5         D   1.0
4           b                 6         E   2.0
5           b                 7         F   3.0
6           b                 8         G   4.0
7           c                10         H   1.0
8           c                11         I   2.0
9           c                12         J   3.0
10          c                13         K   4.0

Remember to share on social media!
If you like this text, please share it on Facebook/Twitter/LinkedIn/Reddit or other social media.

If you watch programming live streams, check out my YouTube channel.
You can also follow me on Twitter: @mikulskibartosz

If you want to hire me, send me a message on LinkedIn or Twitter.


If this article was helpful, consider donating to WWF or any other charity of your choice.
Bartosz Mikulski
Bartosz Mikulski * data scientist / software engineer * conference speaker * organizer of School of A.I. meetups in Poznań * co-founder of Software Craftsmanship Poznan & Poznan Scala User Group