How to choose the right mini-batch size in deep learning
This blog post contains a summary of Andrew Ng’s advice regarding choosing the mini-batch size for gradient descent while training a deep learning model. Fortunately, this hint is not complicated, so the blog post is going to be extremely short ;)
Andrew Ng recommends not using mini-batches if the number of observations is smaller then 2000. In all other cases, he suggests using a power of 2 as the mini-batch size. So the minibatch should be 64, 128, 256, 512, or 1024 elements large.

The most important aspect of the advice is making sure that the mini-batch fits in the CPU/GPU memory! If data fits in CPU/GPU, we can leverage the speed of processor cache, which significantly reduces the time required to train a model!
You may also like
- How to increase accuracy of a deep learning model
- How to automatically select the hyperparameters of a ResNet neural network
- Which hyperparameters of deep learning model are important and how to find them
- Understanding the Keras layer input shapes
- Save and restore a Tensorflow model using Keras for continuous model training
Remember to share on social media! If you like this text, please share it on Facebook/Twitter/LinkedIn/Reddit or other social media.
If you want to contact me, send me a message on LinkedIn or Twitter.
Would you like to have a call and talk? Please schedule a meeting using this link.