How to choose the right mini-batch size in deep learning

How to choose the right mini-batch size in deep learning

This blog post contains a summary of Andrew Ng’s advice regarding choosing the mini-batch size for gradient descent while training a deep learning model. Fortunately, this hint is not complicated, so the blog post is going to be extremely short ;)

Andrew Ng recommends not using mini-batches if the number of observations is smaller then 2000. In all other cases, he suggests using a power of 2 as the mini-batch size. So the minibatch should be 64, 128, 256, 512, or 1024 elements large.

Are you interested in data engineering?

Check out my other blog

The most important aspect of the advice is making sure that the mini-batch fits in the CPU/GPU memory! If data fits in CPU/GPU, we can leverage the speed of processor cache, which significantly reduces the time required to train a model!

Remember to share on social media!
If you like this text, please share it on Facebook/Twitter/LinkedIn/Reddit or other social media.

If you watch programming live streams, check out my YouTube channel.
You can also follow me on Twitter: @mikulskibartosz

If you want to hire me, send me a message on LinkedIn or Twitter.

Bartosz Mikulski
Bartosz Mikulski * data scientist / software/data engineer * conference speaker * organizer of School of A.I. meetups in Poznań * co-founder of Software Craftsmanship Poznan & Poznan Scala User Group