Building trustworthy data pipelines because AI cannot learn from dirty data
Check-Engine - data quality validation for PySpark 3.0.0
Last week, I was testing whether we can use AWS Deequ for data quality validation. I ran into a few problems. First of all, it...
06 Jul 2020
The problem with software testing in data engineering
What if we found a bug in our data pipelines? What if that bug were easy to fix, but it would require a lot of...
15 Jun 2020
Data flow - what functional programming and Unix philosophy can teach us about data streaming
What does stream processing have in common with functional programming and Unix?
04 May 2020
Four books to boost your programmer career
I quit my dream job because of a book
06 Jan 2020
Testing legacy data pipelines
Do you struggle with maintaining your legacy data pipelines? Check out our article on how to add tests and refactor your code while working with legacy data pipelines.
21 Jan 2022
Secrets of mentoring junior software engineers
How to quickly train junior engineers to make them as productive as the rest of the team
14 Jan 2022
What does your data pipeline need in production?
When you're debugging a failing production pipeline at 2 am, what do you need?
07 Jan 2022
How to pass a machine learning engineer interview
Trivial (and easily fixable) mistakes that will make you fail a job interview
31 Dec 2021
Why do data engineers quit?
Why do data engineers quit their jobs?
24 Dec 2021
What is the essential KPI of an MLOps team?
What KPI to measure in an MLOps team
17 Dec 2021
Deploying your first ML model in production
The minimal setup for ML deployment without the things you DON'T need yet
10 Dec 2021
Is it overengineered?
What's the difference between reasonable future-proof architecture and overengineering? Is there a difference?
04 Dec 2021
Pattern matching in Python vs Scala
What is the difference between pattern matching in Python and Scala?
26 Nov 2021
Should you use machine learning in your product?
How to put AI in production without overengineering your system
19 Nov 2021
How does the Atlan data platform help you ensure data quality?
Atlan - a tool for facilitating a collaborative data culture
15 Nov 2021
#AI in production
What should you learn as a data engineer?
Should you spend time learning data engineering tools and libraries?
12 Nov 2021