How to choose the right mini-batch size in deep learning
This blog post contains a summary of Andrew Ng’s advice regarding choosing the mini-batch size for gradient descent while training a deep learning model. Fortunately, this hint is not complicated, so the blog post is going to be extremely short ;)
Andrew Ng recommends not using mini-batches if the number of observations is smaller then 2000. In all other cases, he suggests using a power of 2 as the mini-batch size. So the minibatch should be 64, 128, 256, 512, or 1024 elements large.
The most important aspect of the advice is making sure that the mini-batch fits in the CPU/GPU memory! If data fits in CPU/GPU, we can leverage the speed of processor cache, which significantly reduces the time required to train a model!
You may also like
- The optimal learning rate during fine-tuning of an artificial neural network
- Which hyperparameters of deep learning model are important and how to find them
- Understanding the softmax activation function
- How to automatically select the hyperparameters of a ResNet neural network
- How to train a model in TensorFlow 2.0