Pandas stack and unstack explained
In this blog post, I am going to show you how to use the stack and unstack functions to get data we want without filtering the data frame.
Input
Before we begin, we have to define a data frame. Imagine that we are working at a university and we are responsible for scheduling exams. We have a dataset of all student groups, the number of students in each group, and data whether you have already scheduled the exam.
We have created a data frame indexed by the faculty name and the group id.
1
2
3
4
5
6
7
8
9
10
11
12
import pandas as pd
data = pd.DataFrame([
['Mathematics', 'A', 34, True],
['Mathematics', 'B', 27, False],
['Mathematics', 'C', 29, True],
['Chemistry', 'A', 22, True],
['Chemistry', 'B', 18, False],
['Chemistry', 'C', 25, True]
], columns = ['faculty', 'group_id', 'number_of_students', 'exam_scheduled'])
data = data.set_index(['faculty', 'group_id'])
1
2
3
4
5
6
7
8
9
10
11
print(data)
# Output:
# number_of_students exam_scheduled
#faculty group_id
#Mathematics A 34 True
# B 27 False
# C 29 True
#Chemistry A 22 True
# B 18 False
# C 25 True
Turn an index into column
When we call the unstack function, the innermost index (in this case the “group_id” becomes a subcolumn of the remaining columns.
1
2
3
4
5
6
7
8
9
data.unstack()
# Output:
# number_of_students exam_scheduled
#group_id A B C A B C
#faculty
#Chemistry 22 18 25 True False True
#Mathematics 34 27 29 True False True
Now, when we want to get the number of students in group A, we have to write:
1
2
3
4
5
6
7
8
data.unstack()['number_of_students']['A']
# Output:
#faculty
#Chemistry 22
#Mathematics 34
#Name: A, dtype: int64
As you see, we get the number of students in every faculty that has a group identified by “A.”
Turn a data frame into a nested lookup table
The stack function can be used to turn a data frame into a nested lookup table. It returns a series which is indexed by the data frame indexes + the columns of the data frame.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
data.stack()
# Output:
#aculty group_id
#Mathematics A number_of_students 34
# exam_scheduled True
# B number_of_students 27
# exam_scheduled False
# C number_of_students 29
# exam_scheduled True
#Chemistry A number_of_students 22
# exam_scheduled True
# B number_of_students 18
# exam_scheduled False
# C number_of_students 25
# exam_scheduled True
#dtype: object
Because of that, we can get the values from stacked data frame like this:
1
2
3
4
5
data.stack()['Mathematics']['A']['number_of_students']
# Output:
#34
Did you enjoy reading this article?
Would you like to learn more about leveraging AI to drive growth and innovation, software craft in data engineering, and MLOps?
Subscribe to the newsletter or add this blog to your RSS reader (does anyone still use them?) to get a notification when I publish a new essay!
You may also like

Bartosz Mikulski
- MLOps engineer by day
- AI and data engineering consultant by night
- Python and data engineering trainer
- Conference speaker
- Contributed a chapter to the book "97 Things Every Data Engineer Should Know"
- Twitter: @mikulskibartosz
- Mastodon: @mikulskibartosz@mathstodon.xyz