Building trustworthy data pipelines because AI cannot learn from dirty data
Data engineers are data librarians or how to upgrade your data lake to 2500 BCE technology.
What can data engineers learn from (ancient) librarians?
11 Mar 2022
The problem with software testing in data engineering
What if we found a bug in our data pipelines? What if that bug were easy to fix, but it would require a lot of...
15 Jun 2020
Data flow - what functional programming and Unix philosophy can teach us about data streaming
What does stream processing have in common with functional programming and Unix?
04 May 2020
Four books to boost your programmer career
I quit my dream job because of a book
06 Jan 2020
How to debug code
How to debug code and solve problems as fast as possible
20 Sep 2022
CUPID properties in data engineering
SOLID principles vs. CUPID properties in data engineering
10 Sep 2022
How to add tests to existing code in data transformation pipelines
How data engineers can write tests for legacy code in their ETL pipelines without breaking the existing implementation
30 Aug 2022
Software engineering practices in data engineering and data science
How to produce high-quality software in data teams
20 Aug 2022
How to sort a Pandas DataFrame by month name
How to use an ordered categorical variable to sort a Pandas Dataframe by months while displaying their names
15 Aug 2022
How to become a data engineer for free
What do you need to know to become a data engineer? Does a data engineer need a degree? How can you get your first data engineering job?
10 Aug 2022
A comprehensive guide to Kappa Architecture
What is Kappa Architecture? When should we use Kappa Architecture? What's the difference between Kappa Architecture and Lambda Architecture? And way, way more!
30 Jul 2022
The secret of working with legacy code on a software team
How to work with code written by other people? What to do when you join a new team?
20 Jul 2022
Functional programming in Python
Does functional programming in Python make sense?
10 Jul 2022
How to write technical documentation
How to document a software project?
20 Jun 2022
ETL vs ELT - what's the difference? Which one should you choose?
Should you use a data warehouse or build a data lake? When is a data warehouse a better choice? When is it better to build a data lake?
10 Jun 2022
Selecting rows in Pandas
How to use loc, iloc, slice, and row filtering in Pandas
27 May 2022