Selecting rows in Pandas

In Pandas, we have multiple methods of selecting the data. Let’s take a look at the four most popular ones.

We will start with a DataFrame containing five rows:

	col_A	col_B
0	1	A
1	2	B
2	3	C
3	4	D
4	5	E

the loc function

First, we will use the loc function. loc lets us select rows using the DataFrame index. For example, if we write data.loc[[0,1,4]], we will get the first, the second, and the last row of our DataFrame.

	col_A	col_B
0	1	A
1	2	B
4	5	E

Of course, it’s difficult to spot the benefit of using the loc function when we have a numeric index. Because of that, we will set the col_B column as the index and use its values to select the rows:

data.set_index('col_B').loc[['A', 'B', 'E']]

col_B	col_A
A	1
B	2
E	5

the iloc function

Similarly to loc with a numeric index, we can use the iloc function to retrieve rows using their position in the DataFrame. Let’s retrieve the last two rows:

data.iloc[[3,4]]

	col_A	col_B
3	4	D
4	5	E

Using a binary mask

In Pandas, we can pass a binary array to the DataFrame selector to retrieve the corresponding rows.

We are going to need an array of bool values. The array must have the same length as our DataFrame.

binary = [True, False, True, True, False]
data[binary]

	col_A	col_B
0	1	A
2	3	C
3	4	D

The most popular data selection method involves generating the binary array using the values from the DataFrame. For example, we can retrieve the rows in which col_A has values smaller than 3:

data[data['col_A'] < 3]

	col_A	col_B
0	1	A
1	2	B

Slicing a DataFrame

Finally, we can use the slicing operation that works like the same operation in Python lists.

data[2:3]

	col_A	col_B
2	3	C

data[:2]

	col_A	col_B
0	1	A
1	2	B

data[1:]

	col_A	col_B
1	2	B
2	3	C
3	4	D
4	5	E

data[::2]

	col_A	col_B
0	1	A
2	3	C
4	5	E

Selecting rows in Pandas

the loc function

the iloc function

Using a binary mask

Slicing a DataFrame

Python decorators explained

ETL vs ELT - what's the difference? Which one should you choose?

Selecting rows in Pandas

the loc function

the iloc function

Using a binary mask

Slicing a DataFrame

Python decorators explained

ETL vs ELT - what's the difference? Which one should you choose?

Related Posts

Test-Driven Development in Python with Pytest

Functional programming in Python

Python decorators explained