How to train a model in TensorFlow 2.0

How to train a model in TensorFlow 2.0

In this article, I am going to show how to use TensorFlow 2.0. Fortunately, the new version of TensorFlow makes it straightforward.

First, we need to import TensorFlow and Keras.

1
2
import tensorflow as tf
from tensorflow import keras

As an example, I am going to use the Fashion-MNIST dataset provided by Zalando. It is one of the “built-in” datasets available in Keras, so we can load it from the datasets package.

1
2
fashion_mnist = keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

The dataset consists of gray-scale images of clothes scaled to 28x28 pixels. The values of colors are in range 0-255, so I am going to divide them by 255 to get values between 0 and 1.

1
2
train_images = train_images / 255.0
test_images = test_images / 255.0

My model starts with a convolutional layer, so I have to reshape the input. It must fit the expected (number of samples in a batch, dim1, dim2, number of channels) shape.

In my training dataset, I have 60000 images. Every one of them has 28x28 pixels and one color channel. Hence the proper shape of the input for the convolutional layer is (60000, 28, 28, 1).

Do you want to show your product/service to 25000 data science enthusiasts every month? I am looking for companies which would like to become a partner of this blog.

Are you interested? Is your employer interested? Here are the details of the offer.

To avoid hardcoding the number of images, I use the len function. I also have to repeat that operation for the test dataset.

1
2
train_images = train_images.reshape(len(train_images), 28, 28, 1)
test_images = test_images.reshape(len(test_images), 28, 28, 1)

Now, I can define the model layers. I want to start with convolutional layers to give the model a chance of detecting shapes in the images. The kernel size is the size of the convolution window (if only one value is specified, it uses the same value for both dimensions).

Let’s use the following values as the layer configuration. Because it is just an example, I am not going to do any hyperparameter search. Instead of that, I picked the values manually by changing them randomly a few times and choosing the one that gives the best accuracy.

1
2
3
4
5
6
7
8
9
10
model = keras.Sequential([
    keras.layers.Conv2D(64, kernel_size=3, activation='relu', input_shape=(28,28,1)), #note that we don't need the number of samples in the shape definition
    keras.layers.Conv2D(32, kernel_size=3, activation='relu'),
    keras.layers.MaxPooling2D(pool_size=(2, 2)),
    keras.layers.Dropout(0.1),
    keras.layers.Flatten(),
    keras.layers.Dense(32, activation='relu'),
    keras.layers.Dropout(0.1),
    keras.layers.Dense(10, activation='softmax')
])

After defining the layers, I have to configure the optimizer. Because we are trying to do multiclass classification, I define the loss as sparse categorical cross-entropy. Obviously, I am not interested in that value at all, so I am also going to track the accuracy, which is more human-friendly metric.

1
2
3
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

I will train the model for 10 epochs.

1
model.fit(train_images, train_labels, epochs=10)

Output:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Train on 60000 samples
Epoch 1/10
60000/60000 [==============================] - 145s 2ms/sample - loss: 0.4928 - accuracy: 0.8244
Epoch 2/10
60000/60000 [==============================] - 143s 2ms/sample - loss: 0.3130 - accuracy: 0.8882
Epoch 3/10
60000/60000 [==============================] - 143s 2ms/sample - loss: 0.2639 - accuracy: 0.9039
Epoch 4/10
60000/60000 [==============================] - 143s 2ms/sample - loss: 0.2310 - accuracy: 0.9151
Epoch 5/10
60000/60000 [==============================] - 142s 2ms/sample - loss: 0.2107 - accuracy: 0.9215
Epoch 6/10
60000/60000 [==============================] - 142s 2ms/sample - loss: 0.1910 - accuracy: 0.9286
Epoch 7/10
60000/60000 [==============================] - 141s 2ms/sample - loss: 0.1791 - accuracy: 0.9333
Epoch 8/10
60000/60000 [==============================] - 141s 2ms/sample - loss: 0.1614 - accuracy: 0.9394
Epoch 9/10
60000/60000 [==============================] - 142s 2ms/sample - loss: 0.1533 - accuracy: 0.9413
Epoch 10/10
60000/60000 [==============================] - 141s 2ms/sample - loss: 0.1438 - accuracy: 0.9444

In the end, I can run evaluate function, and check the model’s performance on the test dataset.

1
model.evaluate(test_images, test_labels)

Output (the first number is the loss, the second one is the accuracy):

1
2
10000/10000 [==============================] - 6s 571us/sample - loss: 0.2699 - accuracy: 0.9115
[0.2698916170120239, 0.9115]

Remember to share on social media!
If you like this text, please share it on Facebook/Twitter/LinkedIn/Reddit or other social media.

If you watch programming live streams, check out my YouTube channel.
You can also follow me on Twitter: @mikulskibartosz

If you want to hire me, send me a message on LinkedIn or Twitter.


If this article was helpful, consider donating to WWF or any other charity of your choice.
Bartosz Mikulski
Bartosz Mikulski * data scientist / software engineer * conference speaker * organizer of School of A.I. meetups in Poznań * co-founder of Software Craftsmanship Poznan & Poznan Scala User Group