How to choose the right mini-batch size in deep learning
This blog post contains a summary of Andrew Ng’s advice regarding choosing the mini-batch size for gradient descent while training a deep learning model. Fortunately, this hint is not complicated, so the blog post is going to be extremely short ;)
Andrew Ng recommends not using mini-batches if the number of observations is smaller then 2000. In all other cases, he suggests using a power of 2 as the mini-batch size. So the minibatch should be 64, 128, 256, 512, or 1024 elements large.
The most important aspect of the advice is making sure that the mini-batch fits in the CPU/GPU memory! If data fits in CPU/GPU, we can leverage the speed of processor cache, which significantly reduces the time required to train a model!
Did you enjoy reading this article?
Would you like to learn more about software craft in data engineering and MLOps?
Subscribe to the newsletter or add this blog to your RSS reader (does anyone still use them?) to get a notification when I publish a new essay!
You may also like
- How to train a model in TensorFlow 2.0
- Understanding layer size in Convolutional Neural Networks
- Save and restore a Tensorflow model using Keras for continuous model training
- The optimal learning rate during fine-tuning of an artificial neural network
- How to increase accuracy of a deep learning model
- Data/MLOps engineer by day
- DevRel/copywriter by night
- Python and data engineering trainer
- Conference speaker
- Contributed a chapter to the book "97 Things Every Data Engineer Should Know"
- Twitter: @mikulskibartosz