Pandas stack and unstack explained

Pandas stack and unstack explained

In this blog post, I am going to show you how to use the stack and unstack functions to get data we want without filtering the data frame.

Input

Before we begin, we have to define a data frame. Imagine that we are working at a university and we are responsible for scheduling exams. We have a dataset of all student groups, the number of students in each group, and data whether you have already scheduled the exam.

We have created a data frame indexed by the faculty name and the group id.

1
2
3
4
5
6
7
8
9
10
11
12
import pandas as pd

data = pd.DataFrame([
    ['Mathematics', 'A', 34, True],
    ['Mathematics', 'B', 27, False],
    ['Mathematics', 'C', 29, True],
    ['Chemistry', 'A', 22, True],
    ['Chemistry', 'B', 18, False],
    ['Chemistry', 'C', 25, True]
], columns = ['faculty', 'group_id', 'number_of_students', 'exam_scheduled'])

data = data.set_index(['faculty', 'group_id'])
1
2
3
4
5
6
7
8
9
10
11
print(data)
# Output:

#                      number_of_students  exam_scheduled
#faculty     group_id                                    
#Mathematics A                         34            True
#            B                         27           False
#            C                         29            True
#Chemistry   A                         22            True
#            B                         18           False
#            C                         25            True

Turn an index into column

When we call the unstack function, the innermost index (in this case the “group_id” becomes a subcolumn of the remaining columns.

1
2
3
4
5
6
7
8
9
data.unstack()

# Output:

#            number_of_students         exam_scheduled             
#group_id                     A   B   C              A      B     C
#faculty                                                           
#Chemistry                   22  18  25           True  False  True
#Mathematics                 34  27  29           True  False  True

Now, when we want to get the number of students in group A, we have to write:

1
2
3
4
5
6
7
8
data.unstack()['number_of_students']['A']

# Output:

#faculty
#Chemistry      22
#Mathematics    34
#Name: A, dtype: int64

As you see, we get the number of students in every faculty that has a group identified by “A.”

Turn a data frame into a nested lookup table

The stack function can be used to turn a data frame into a nested lookup table. It returns a series which is indexed by the data frame indexes + the columns of the data frame.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
data.stack()

# Output:

#aculty      group_id                    
#Mathematics  A         number_of_students       34
#                       exam_scheduled         True
#             B         number_of_students       27
#                       exam_scheduled        False
#             C         number_of_students       29
#                       exam_scheduled         True
#Chemistry    A         number_of_students       22
#                       exam_scheduled         True
#             B         number_of_students       18
#                       exam_scheduled        False
#             C         number_of_students       25
#                       exam_scheduled         True
#dtype: object

Because of that, we can get the values from stacked data frame like this:

1
2
3
4
5
data.stack()['Mathematics']['A']['number_of_students']

# Output:

#34

Remember to share on social media!
If you like this text, please share it on Facebook/Twitter/LinkedIn/Reddit or other social media.

If you watch programming live streams, check out my YouTube channel.
You can also follow me on Twitter: @mikulskibartosz

If you want to hire me, send me a message on LinkedIn or Twitter.


Bartosz Mikulski
Bartosz Mikulski * data scientist / software/data engineer * conference speaker * organizer of School of A.I. meetups in Poznań * co-founder of Software Craftsmanship Poznan & Poznan Scala User Group