Pandas stack and unstack explained

In this blog post, I am going to show you how to use the stack and unstack functions to get data we want without filtering the data frame.

Input

Before we begin, we have to define a data frame. Imagine that we are working at a university and we are responsible for scheduling exams. We have a dataset of all student groups, the number of students in each group, and data whether you have already scheduled the exam.

We have created a data frame indexed by the faculty name and the group id.

1
2
3
4
5
6
7
8
9
10
11
12
import pandas as pd

data = pd.DataFrame([
    ['Mathematics', 'A', 34, True],
    ['Mathematics', 'B', 27, False],
    ['Mathematics', 'C', 29, True],
    ['Chemistry', 'A', 22, True],
    ['Chemistry', 'B', 18, False],
    ['Chemistry', 'C', 25, True]
], columns = ['faculty', 'group_id', 'number_of_students', 'exam_scheduled'])

data = data.set_index(['faculty', 'group_id'])
1
2
3
4
5
6
7
8
9
10
11
print(data)
# Output:

#                      number_of_students  exam_scheduled
#faculty     group_id                                    
#Mathematics A                         34            True
#            B                         27           False
#            C                         29            True
#Chemistry   A                         22            True
#            B                         18           False
#            C                         25            True

Turn an index into column

When we call the unstack function, the innermost index (in this case the “group_id” becomes a subcolumn of the remaining columns.

1
2
3
4
5
6
7
8
9
data.unstack()

# Output:

#            number_of_students         exam_scheduled             
#group_id                     A   B   C              A      B     C
#faculty                                                           
#Chemistry                   22  18  25           True  False  True
#Mathematics                 34  27  29           True  False  True

Now, when we want to get the number of students in group A, we have to write:

1
2
3
4
5
6
7
8
data.unstack()['number_of_students']['A']

# Output:

#faculty
#Chemistry      22
#Mathematics    34
#Name: A, dtype: int64

As you see, we get the number of students in every faculty that has a group identified by “A.”

Turn a data frame into a nested lookup table

The stack function can be used to turn a data frame into a nested lookup table. It returns a series which is indexed by the data frame indexes + the columns of the data frame.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
data.stack()

# Output:

#aculty      group_id                    
#Mathematics  A         number_of_students       34
#                       exam_scheduled         True
#             B         number_of_students       27
#                       exam_scheduled        False
#             C         number_of_students       29
#                       exam_scheduled         True
#Chemistry    A         number_of_students       22
#                       exam_scheduled         True
#             B         number_of_students       18
#                       exam_scheduled        False
#             C         number_of_students       25
#                       exam_scheduled         True
#dtype: object

Because of that, we can get the values from stacked data frame like this:

1
2
3
4
5
data.stack()['Mathematics']['A']['number_of_students']

# Output:

#34

Did you enjoy reading this article?
Would you like to learn more about leveraging AI to drive growth and innovation, software craft in data engineering, and MLOps?

Subscribe to the newsletter or add this blog to your RSS reader (does anyone still use them?) to get a notification when I publish a new essay!

Newsletter

Do you enjoy reading my articles?
Subscribe to the newsletter if you don't want to miss the new content, business offers, and free training materials.

Bartosz Mikulski

Bartosz Mikulski

  • MLOps engineer by day
  • AI and data engineering consultant by night
  • Python and data engineering trainer
  • Conference speaker
  • Contributed a chapter to the book "97 Things Every Data Engineer Should Know"
  • Twitter: @mikulskibartosz
  • Mastodon: @mikulskibartosz@mathstodon.xyz
Newsletter

Do you enjoy reading my articles?
Subscribe to the newsletter if you don't want to miss the new content, business offers, and free training materials.