Deep learning

How to choose the right mini-batch size in deep learning

Andrew Ng recommendation about mini batch size

Bartosz Mikulski 19 Apr 2019 – 1 min read

This blog post contains a summary of Andrew Ng’s advice regarding choosing the mini-batch size for gradient descent while training a deep learning model. Fortunately, this hint is not complicated, so the blog post is going to be extremely short ;)

Andrew Ng recommends not using mini-batches if the number of observations is smaller then 2000. In all other cases, he suggests using a power of 2 as the mini-batch size. So the minibatch should be 64, 128, 256, 512, or 1024 elements large.

The most important aspect of the advice is making sure that the mini-batch fits in the CPU/GPU memory! If data fits in CPU/GPU, we can leverage the speed of processor cache, which significantly reduces the time required to train a model!

Older post

How to choose the right mini-batch size in deep learning

How to deal with underfitting and overfitting in deep learning

Which hyperparameters of deep learning model are important and how to find them

How to choose the right mini-batch size in deep learning

How to deal with underfitting and overfitting in deep learning

Which hyperparameters of deep learning model are important and how to find them

Related Posts

Why do we use dropout in artificial neural networks?

How to automatically select the hyperparameters of a ResNet neural network

Using Hyperband for TensorFlow hyperparameter tuning with keras-tuner