Pandas stack and unstack explained
In this blog post, I am going to show you how to use the stack and unstack functions to get data we want without filtering the data frame.
Input
Before we begin, we have to define a data frame. Imagine that we are working at a university and we are responsible for scheduling exams. We have a dataset of all student groups, the number of students in each group, and data whether you have already scheduled the exam.
We have created a data frame indexed by the faculty name and the group id.
1
2
3
4
5
6
7
8
9
10
11
12
import pandas as pd
data = pd.DataFrame([
['Mathematics', 'A', 34, True],
['Mathematics', 'B', 27, False],
['Mathematics', 'C', 29, True],
['Chemistry', 'A', 22, True],
['Chemistry', 'B', 18, False],
['Chemistry', 'C', 25, True]
], columns = ['faculty', 'group_id', 'number_of_students', 'exam_scheduled'])
data = data.set_index(['faculty', 'group_id'])
1
2
3
4
5
6
7
8
9
10
11
print(data)
# Output:
# number_of_students exam_scheduled
#faculty group_id
#Mathematics A 34 True
# B 27 False
# C 29 True
#Chemistry A 22 True
# B 18 False
# C 25 True

Turn an index into column
When we call the unstack function, the innermost index (in this case the “group_id” becomes a subcolumn of the remaining columns.
1
2
3
4
5
6
7
8
9
data.unstack()
# Output:
# number_of_students exam_scheduled
#group_id A B C A B C
#faculty
#Chemistry 22 18 25 True False True
#Mathematics 34 27 29 True False True
Now, when we want to get the number of students in group A, we have to write:
1
2
3
4
5
6
7
8
data.unstack()['number_of_students']['A']
# Output:
#faculty
#Chemistry 22
#Mathematics 34
#Name: A, dtype: int64
As you see, we get the number of students in every faculty that has a group identified by “A.”
Turn a data frame into a nested lookup table
The stack function can be used to turn a data frame into a nested lookup table. It returns a series which is indexed by the data frame indexes + the columns of the data frame.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
data.stack()
# Output:
#aculty group_id
#Mathematics A number_of_students 34
# exam_scheduled True
# B number_of_students 27
# exam_scheduled False
# C number_of_students 29
# exam_scheduled True
#Chemistry A number_of_students 22
# exam_scheduled True
# B number_of_students 18
# exam_scheduled False
# C number_of_students 25
# exam_scheduled True
#dtype: object
Because of that, we can get the values from stacked data frame like this:
1
2
3
4
5
data.stack()['Mathematics']['A']['number_of_students']
# Output:
#34
You may also like
Remember to share on social media! If you like this text, please share it on Facebook/Twitter/LinkedIn/Reddit or other social media.
If you want to contact me, send me a message on LinkedIn or Twitter.
Would you like to have a call and talk? Please schedule a meeting using this link.