mikulskibartosz.name
Career Coaching for Data Professionals
Speaker
Bartosz Mikulski
Building trustworthy data pipelines because AI cannot learn from dirty data
All Stories
Generalized Linear Models — Using linear regression when the dependent variable does not follow Gaussian distribution
Understanding the GLM from the statsmodels package
PCA — how to choose the number of components?
How many principal components do we need when using Principal Component Analysis?
How to avoid bias against underrepresented target classes while training a machine learning model
The difference between KFold and StratifiedKFold in Scikit-learn
How to get the value by rank from a grouped Pandas dataframe
How to rank a grouped data frame in Pandas
The difference between the expanding and rolling window in Pandas
How to use rolling window with datetime (and other types) in Pandas
Write everything down
Lessons learnt from "Practical Data Cleaning" by Lee Baker
Understanding layer size in Convolutional Neural Networks
Filter size, padding, and stride explained
Calculating the cumulative sum of a group using Apache Spark
How to use the window function to calculate a cumulative sum
How to write to a Parquet file in Scala without using Apache Spark
Row number in Apache Spark window — row_number, rank, and dense_rank
This article is mostly a “note to self” because I don’t want to google that anymore ;)
How to display all columns of a Pandas DataFrame in Jupyter Notebook
Review of “Conversations On Data Science” by Roger D. Peng and Hilary Parker
« Prev
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Next »