Selecting rows in Pandas

In Pandas, we have multiple methods of selecting the data. Let’s take a look at the four most popular ones.

We will start with a DataFrame containing five rows:

  col_A col_B
0 1 A
1 2 B
2 3 C
3 4 D
4 5 E

the loc function

First, we will use the loc function. loc lets us select rows using the DataFrame index. For example, if we write data.loc[[0,1,4]], we will get the first, the second, and the last row of our DataFrame.

  col_A col_B
0 1 A
1 2 B
4 5 E

Of course, it’s difficult to spot the benefit of using the loc function when we have a numeric index. Because of that, we will set the col_B column as the index and use its values to select the rows:

1
data.set_index('col_B').loc[['A', 'B', 'E']]
col_B col_A
A 1
B 2
E 5

the iloc function

Similarly to loc with a numeric index, we can use the iloc function to retrieve rows using their position in the DataFrame. Let’s retrieve the last two rows:

1
data.iloc[[3,4]]
  col_A col_B
3 4 D
4 5 E

Using a binary mask

In Pandas, we can pass a binary array to the DataFrame selector to retrieve the corresponding rows.

We are going to need an array of bool values. The array must have the same length as our DataFrame.

1
2
binary = [True, False, True, True, False]
data[binary]
  col_A col_B
0 1 A
2 3 C
3 4 D

The most popular data selection method involves generating the binary array using the values from the DataFrame. For example, we can retrieve the rows in which col_A has values smaller than 3:

1
data[data['col_A'] < 3]
  col_A col_B
0 1 A
1 2 B

Slicing a DataFrame

Finally, we can use the slicing operation that works like the same operation in Python lists.

1
data[2:3]
  col_A col_B
2 3 C
1
data[:2]
  col_A col_B
0 1 A
1 2 B
1
data[1:]
  col_A col_B
1 2 B
2 3 C
3 4 D
4 5 E
1
data[::2]
  col_A col_B
0 1 A
2 3 C
4 5 E

Did you enjoy reading this article?
Would you like to learn more about software craft in data engineering and MLOps?

Subscribe to the newsletter or add this blog to your RSS reader (does anyone still use them?) to get a notification when I publish a new essay!

Newsletter

Do you enjoy reading my articles?
Subscribe to the newsletter if you don't want to miss the new content, business offers, and free training materials.

Bartosz Mikulski

Bartosz Mikulski

  • Data/MLOps engineer by day
  • DevRel/copywriter by night
  • Python and data engineering trainer
  • Conference speaker
  • Contributed a chapter to the book "97 Things Every Data Engineer Should Know"
  • Twitter: @mikulskibartosz
Newsletter

Do you enjoy reading my articles?
Subscribe to the newsletter if you don't want to miss the new content, business offers, and free training materials.