Building trustworthy data pipelines because AI cannot learn from dirty data
Language is all about nouns
Programmers are afraid of nouns. We often replace them with poorly written descriptions of things.
12 Sep 2018
Outlier detection with Scikit Learn
Z-score and Density-Based Spatial Clustering of Applications with Noise
10 Sep 2018
How to set the global random_state in Scikit Learn
What to do if you keep forgetting to set the random_state?
31 Aug 2018
JUG Thüringen meetup - retrospective
My opinion about my presentation at a meetup in Erfurt, Germany.
29 Aug 2018
How to split a list inside a Dataframe cell into rows in Pandas
Step by step instructions to "explode" a list into DataFrame rows.
27 Aug 2018
Interactive plots in Jupyter Notebook
How to create a plot that supports zooming
24 Aug 2018
[book review] James Whittaker's Little Book of the Future
Read this book if you believe we can use A.I. and IoT to build a bright future.
22 Aug 2018
Probability plot - visually compare probability distributions
How to visually check whether your sample is normally distributed?
20 Aug 2018
Count unique elements of an infinite stream of objects
HyperLogLog - probabilistic counting algorithm
19 Aug 2018
Live unit testing with sbt
Can I have the coolest Visual Studio feature in IntelliJ?
18 Aug 2018
Monte Carlo simulation in Python
How to make business decisions using the Monte Carlo simulation?
17 Aug 2018
Word cloud from a Pandas data frame
Create a nice visualization of the most popular words in your data frame
07 Aug 2018