How to set the global random_state in Scikit Learn

Such information should be in the first paragraph of Scikit Learn manual, but it is hidden somewhere in the FAQ, so let’s write about it here.

Scikit Learn does not have its own global random state but uses the numpy random state instead. If you want to have reproducible results in Jupyter Notebook (you should want that ;) ), set the seed at the beginning of your notebook:

np.random.seed(31415)

How can we check if it works? Run this code:

import numpy as np
print('Without seed')
print(norm.rvs(10, size = 4))
print(norm.rvs(10, size = 4))

print('With the same seed')
np.random.seed(31415)
print(norm.rvs(10, size = 4))
np.random.seed(31415) # reset the random seed back to 31415
print(norm.rvs(10, size = 4))

print('Without seed')
np.random.seed(None)
print(norm.rvs(10, size = 4))
print(norm.rvs(10, size = 4)

In my case the output was:

Without seed
[11.87381912 10.67665352 10.93843519  9.68574986]
[10.16669138  9.41330164  9.64055638  8.49694282]
With the same seed
[11.36242188 11.13410818 12.36307449  9.74043318]
[11.36242188 11.13410818 12.36307449  9.74043318]
Without seed
[ 8.79608103  9.40920579 11.23146236 10.18055655]
[11.5560791   9.77978961 11.9580387  11.39481905]
Older post

JUG Thüringen meetup - retrospective

My opinion about my presentation at a meetup in Erfurt, Germany.

Newer post

Outlier detection with Scikit Learn

Z-score and Density-Based Spatial Clustering of Applications with Noise