Understanding the Keras layer input shapes

When creating a sequential model using Keras, we have to specify only the shape of the first layer. The number of expected values in the shape tuple depends on the type of the first layer. I have made a list of layers and their input shape parameters.

Batch size

(Almost) every kind of layer has the batch size parameter as the first elements of the input_shape tuple, but we usually don’t specify it as a part of the input definition. We do it later, during training, so I am going to skip the batch size in my examples.

Layer input shape parameters

Dense

The actual shape depends on the number of dimensions. In the case of a one-dimensional array of n features, the input_shape looks like this (batch_size, n). As I mentioned before, we can skip the batch_size when we define the model structure, so in the code, we write:

1
keras.layers.Dense(32, activation='relu', input_shape=(16,))

Conv2D

Surprisingly, the convolutional layer used for images needs four-dimensional input. As usual, the first parameter is the batch size and (as usual) we skip it. Next, are two parameters that denote the number of pixels of the image. As the last parameter, we put the number of channels. In case of a standard RGB image, the number of channels is 3.

It is also possible to specify the number of channels before the size. To do this, we also have to set the data_format to “channel_first.”

1
2
3
4
5
keras.layers.Conv2D(128, kernel_size=3, activation='relu', input_shape=(64,64,3)) #channel_last (the default style)

# or

keras.layers.Conv2D(128, kernel_size=3, activation='relu', input_shape=(3,64,64), data_format='channel_first')

LSTM

In the case of LSTM, we have three parameters. The implicit batch size, which we don’t define (as usual), the parameter that denotes the number of timesteps (in this example, 4), and the number of features of every timestep (16).

1
keras.layers.LSTM(units=8,input_shape=(4,16))

Note that, I also had to specify the number of units (the number 8 in the first parameter). That is the size of the output. It is going to be used as the input size of the next layer.

ConvLSTM2D

Very similar to Conv2d. We put the additional time parameter after the batch size (so it is always the first one in the tuple, even if “channel_first” parameter is used, in that case, the channel is the second parameter). In this example, 4 denotes the number of timesteps.

1
2
3
4
5
keras.layers.ConvLSTM2D(filters=128, kernel_size=3, activation='relu', input_shape=(4,64,64,3)) #channel_last (the default style)

# or

keras.layers.ConvLSTM2D(filters=128, kernel_size=3, activation='relu', input_shape=(4,3,64,64), data_format='channel_first')

Did you enjoy reading this article?
Would you like to learn more about leveraging AI to drive growth and innovation, software craft in data engineering, and MLOps?

Subscribe to the newsletter or add this blog to your RSS reader (does anyone still use them?) to get a notification when I publish a new essay!

Newsletter

Do you enjoy reading my articles?
Subscribe to the newsletter if you don't want to miss the new content, business offers, and free training materials.

Bartosz Mikulski

Bartosz Mikulski

  • MLOps engineer by day
  • AI and data engineering consultant by night
  • Python and data engineering trainer
  • Conference speaker
  • Contributed a chapter to the book "97 Things Every Data Engineer Should Know"
  • Twitter: @mikulskibartosz
  • Mastodon: @mikulskibartosz@mathstodon.xyz
Newsletter

Do you enjoy reading my articles?
Subscribe to the newsletter if you don't want to miss the new content, business offers, and free training materials.