Pandas stack and unstack explained

In this blog post, I am going to show you how to use the stack and unstack functions to get data we want without filtering the data frame.

Input

Before we begin, we have to define a data frame. Imagine that we are working at a university and we are responsible for scheduling exams. We have a dataset of all student groups, the number of students in each group, and data whether you have already scheduled the exam.

We have created a data frame indexed by the faculty name and the group id.

1
2
3
4
5
6
7
8
9
10
11
12
import pandas as pd

data = pd.DataFrame([
    ['Mathematics', 'A', 34, True],
    ['Mathematics', 'B', 27, False],
    ['Mathematics', 'C', 29, True],
    ['Chemistry', 'A', 22, True],
    ['Chemistry', 'B', 18, False],
    ['Chemistry', 'C', 25, True]
], columns = ['faculty', 'group_id', 'number_of_students', 'exam_scheduled'])

data = data.set_index(['faculty', 'group_id'])
1
2
3
4
5
6
7
8
9
10
11
print(data)
# Output:

#                      number_of_students  exam_scheduled
#faculty     group_id                                    
#Mathematics A                         34            True
#            B                         27           False
#            C                         29            True
#Chemistry   A                         22            True
#            B                         18           False
#            C                         25            True

Subscribe to the newsletter and join the free email course.

Turn an index into column

When we call the unstack function, the innermost index (in this case the “group_id” becomes a subcolumn of the remaining columns.

1
2
3
4
5
6
7
8
9
data.unstack()

# Output:

#            number_of_students         exam_scheduled             
#group_id                     A   B   C              A      B     C
#faculty                                                           
#Chemistry                   22  18  25           True  False  True
#Mathematics                 34  27  29           True  False  True

Now, when we want to get the number of students in group A, we have to write:

1
2
3
4
5
6
7
8
data.unstack()['number_of_students']['A']

# Output:

#faculty
#Chemistry      22
#Mathematics    34
#Name: A, dtype: int64

As you see, we get the number of students in every faculty that has a group identified by “A.”

Turn a data frame into a nested lookup table

The stack function can be used to turn a data frame into a nested lookup table. It returns a series which is indexed by the data frame indexes + the columns of the data frame.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
data.stack()

# Output:

#aculty      group_id                    
#Mathematics  A         number_of_students       34
#                       exam_scheduled         True
#             B         number_of_students       27
#                       exam_scheduled        False
#             C         number_of_students       29
#                       exam_scheduled         True
#Chemistry    A         number_of_students       22
#                       exam_scheduled         True
#             B         number_of_students       18
#                       exam_scheduled        False
#             C         number_of_students       25
#                       exam_scheduled         True
#dtype: object

Because of that, we can get the values from stacked data frame like this:

1
2
3
4
5
data.stack()['Mathematics']['A']['number_of_students']

# Output:

#34

Remember to share on social media!
If you like this text, please share it on Facebook/Twitter/LinkedIn/Reddit or other social media.

If you want to contact me, send me a message on LinkedIn or Twitter.

Would you like to have a call and talk? Please schedule a meeting using this link.


Bartosz Mikulski
Bartosz Mikulski * data/machine learning engineer * conference speaker * co-founder of Software Craft Poznan & Poznan Scala User Group

Subscribe to the newsletter and get access to my free email course on building trustworthy data pipelines.

Do you want to work with me at riskmethods?

REMOTE position (available in Poland or Germany)