Understanding layer size in Convolutional Neural Networks

What size is going the output have after applying a convolutional pooling layer? I used to have no idea. I sort of could imagine what happens when a filter is applied, but when we added padding and increase the stride, my imagination got lost.

If you have a similar problem, this article is for you. I am going to explain how the filter size influences the size of the next layer, how to use padding, and what happens when you use stride.

Input

For the sake of an example, let’s use the following data as the input for the pooling layer. Also, I am going to use the max pooling, just because it is simple and makes a good example.

Filter

I decided to use a 3x3 filter. It means that the output shrinks by two columns and rows. Why? Let’s look at the result of applying the max pooling to the first set of cells.

Cells selected by the 3x3 max pooling filter
Cells selected by the 3x3 max pooling filter

The maximal value of the selected cells is 7, so the output looks like this:

Result (after one step) of the 3x3 filter applied to the example input
Result (after one step) of the 3x3 filter applied to the example input

Now, I have to move my filter. To make things easy, in the first example, the stride has to be 1, so let’s move the filter by one column.

Cells selected by the 3x3 max pooling filter in the second step
Cells selected by the 3x3 max pooling filter in the second step

That gives me another value for the output:

If I continue applying the max pooling filter, I am going to end up with this result:

What happened? From the pooling filter, I get only one value, so when the stride is 1, and there is no padding, the output is going to shrink by:

1
number_of_lost_columns_or_rows = the_size_of_the_filter - 1

(in this example, 2 columns and 2 rows)

Padding

What if I want to have the output in the same size as the input without changing the filter? I must add two columns and two rows to the input. If I use zero-padding with size 2, it will mean that I add two rows and two columns which contain only zeros as the border of the input:

Input with zero-padding
Input with zero-padding

Now, when I apply the filter, it is going to select the following cells:

so the max pooling returns this:

In the next step, the filter selects those cells:

and the result looks like this:

When the filter gets applied to all cells, this is going to be the final result:

Stride

Let’s use the input without padding again, but this time with stride = 2.

In the first step, the filter selects these cells:

Then, it moves two columns to the right, so the second step selects the following cells:

After those steps, the output contains:

We also see that, when the input is a 7x7 matrix, the filter has size 3x3 with stride 2 and without padding, the output is going to be a 3x3 matrix.

Formula

Things get complicated. Fortunately, there is a formula that lets us calculate the size of the output.

W — the width of the input
F_w — the width of the filter
P — padding
S_w — the horizontal stride

H — the height of the input
F_h — the height of the filter
P — padding
S_h — the vertical stride

Did you enjoy reading this article?
Would you like to learn more about software craft in data engineering and MLOps?

Subscribe to the newsletter or add this blog to your RSS reader (does anyone still use them?) to get a notification when I publish a new essay!

Newsletter

Do you enjoy reading my articles?
Subscribe to the newsletter if you don't want to miss the new content, business offers, and free training materials.

Bartosz Mikulski

Bartosz Mikulski

  • Data/MLOps engineer by day
  • DevRel/copywriter by night
  • Python and data engineering trainer
  • Conference speaker
  • Contributed a chapter to the book "97 Things Every Data Engineer Should Know"
  • Twitter: @mikulskibartosz
Newsletter

Do you enjoy reading my articles?
Subscribe to the newsletter if you don't want to miss the new content, business offers, and free training materials.