Archive of posts with
category 'Data Science'
Numpy reshape explained
How to use the reshape function in Numpy
Human bias in A/B testing
Underpowered tests, true negative, and ignored tests results
Smoothing time series in Python using Savitzky–Golay filter
In this article, I will show you how to use the Savitzky-Golay filter in Python and show you how it works. To understand the Savitzky–Golay filter, you should be familiar...
XGBoost hyperparameter tuning in Python using grid search
Using GridSearchCV from Scikit-Learn to tune XGBoost classifier
Forecasting time series: using lag features
How to turn Pandas data frame into time-series input for RNN
From Pandas dataframe to RNN input
How to measure the similarity of sequence values
Levenshtein distance and Kendall tau distance
Measuring document similarity in machine learning
How to measure the similarity of two datasets?
Why most data science projects fail?
Product/market fit - buidling a data-driven product
How to test a product idea?
Notetaking for data science
How to document a project?
Wilson score in Python - example
How to get the value by rank from a grouped Pandas dataframe
How to rank a grouped data frame in Pandas
The difference between the expanding and rolling window in Pandas
How to use rolling window with datetime (and other types) in Pandas
Write everything down
Lessons learnt from "Practical Data Cleaning" by Lee Baker
How to display all columns of a Pandas DataFrame in Jupyter Notebook
The silly mistakes in exploratory data analysis
Smoothing time series in Pandas
How to use the exponentially weighted window functions in Pandas
How to reduce memory usage in Pandas
Fit more data in the same amount of memory
Guidelines for data science teams — a summary of Daniel Molnar’s talks
Avoiding over-engineering in machine learning
How to return rows with missing values in Pandas DataFrame
How does it work and why the most popular solution is wrong
Predicting customer lifetime value using the Pareto/NBD model and Gamma-Gamma model
How to estimate the CLV from a list of customer transactions using the lifetimes library in Python
Predicting customer churn using the Pareto/NBD model
How to use a Python lifetimes library to build a Pareto/NBD model.
Business metrics that make no sense
How to define metrics that won’t destroy your business.
How to perform an A/B test correctly in Python
What can we expect from a correctly performed A/B test?
Recommendations vs. raw data — what is better?
Should we suggest an action when we visualize data?
How to display mathematical equations in Jupyter Notebook
LaTeX support in Jupyter Notebook
Apriori algorithm explained
How to change plot size in Jupyter Notebook
Pyplot parameter that configures the chart size
Looking for structure in data — Andrews curves plot explained
How to read Andrews curves chart
Finding seasonality in time series using autocorrelation plot
How to interpret autocorrelation plot?
My favourite data science podcasts
I was asked for some podcast recommendation, so here is my very short list ;)
A podcast that changed my perspective on exploratory data analysis
How to avoid bad science
How to read a confusion matrix
Predicted labels are in columns, right? Or maybe in rows? Do you remember? ;)
F1 score explained
The mathematics behind F1 score.
How to display a progress bar in Jupyter Notebook
Display a progress bar with no additional dependencies, just Python + Jupyter Notebook
How to save a machine learning model into a file
Saving a Scikit-learn model using the joblib library in Python
Bootstrapping vs. bagging
The difference explained
Understanding uncertainty intervals generated by Prophet
How to tweak uncertainty intervals in Prophet.
Prophet plot explained
How to read the Prophet forecast plot
How to visualise prediction errors
How to explain the errors of a linear regression model
Test-driven development in Jupyter Notebook
TDD for data scientists working with Jupyter Notebook
Dealing with dates and time in Pandas
How to use Pandas to parse dates or calculate time in a different timezone.
Fill missing values using Random Forest
How to predict the missing values using Scikit-Learn
Box and whiskers plot
How to plot and interpret the box and whiskers plot
How I failed to plot parallel coordinates in Matplotlib
Built-in matplotlib functions are not enough in this case
Import Jupyter Notebook from GitHub
The easiest way to access someone else’s code in your own notebook
Fill missing values in Pandas
Use the next or previous value to fill the missing values in Pandas
Heat map with Matplotlib
A short tutorial about generating a heat map of the values stored in a Pandas dataframe
Outlier detection with Scikit Learn
Z-score and Density-Based Spatial Clustering of Applications with Noise
How to split a list inside a Dataframe cell into rows in Pandas
Step by step instructions to "explode" a list into DataFrame rows.
Interactive plots in Jupyter Notebook
How to create a plot that supports zooming
Probability plot - visually compare probability distributions
How to visually check whether your sample is normally distributed?
Monte Carlo simulation in Python
How to make business decisions using the Monte Carlo simulation?
Word cloud from a Pandas data frame
Create a nice visualization of the most popular words in your data frame
Visualize common elements of two datasets using NetworkX
How to use undirected graph to visualize common elements of two Pandas data frames
How to load data from Google Drive to Pandas running in Google Colaboratory
