Building trustworthy data pipelines because AI cannot learn from dirty data
Check-Engine - data quality validation for PySpark 3.0.0
Last week, I was testing whether we can use AWS Deequ for data quality validation. I ran into a few problems. First of all, it...
06 Jul 2020
The problem with software testing in data engineering
What if we found a bug in our data pipelines? What if that bug were easy to fix, but it would require a lot of...
15 Jun 2020
Data flow - what functional programming and Unix philosophy can teach us about data streaming
What does stream processing have in common with functional programming and Unix?
04 May 2020
Four books to boost your programmer career
I quit my dream job because of a book
06 Jan 2020
Pattern matching in Python vs Scala
What is the difference between pattern matching in Python and Scala?
26 Nov 2021
Should you use machine learning in your product?
How to put AI in production without overengineering your system
19 Nov 2021
How does the Atlan data platform help you ensure data quality?
Atlan - a tool for facilitating a collaborative data culture
15 Nov 2021
#AI in production
What should you learn as a data engineer?
Should you spend time learning data engineering tools and libraries?
12 Nov 2021
Shadow deployment vs. canary release of machine learning models
What is shadow deployment in machine learning? What is a canary release? What is the difference?
05 Nov 2021
How to deploy a Transformer-based model with custom preprocessing code to Sagemaker Endpoints using BentoML
Deploy a machine learning model with custom inference code to a Sagemaker Endpoint using BentoML
01 Oct 2021
How to teach your team to write automated tests?
How to teach writing automated tests: TDD, BDD, and other techniques
24 Sep 2021
Using AWS Deequ in Python with Python-Deequ
How to use Python-Deequ to validate Spark Dataframes
17 Sep 2021
Building and deploying ML models using Qwak ML platform
What is Qwak ML platform and how does it work?
03 Sep 2021
#AI in production
How to learn TDD
Learning Test-Driven Development is hard and there is nothing we can do about it
27 Aug 2021
Data Engineering - the first principles
What is true in every data engineering project?
20 Aug 2021
How to deploy MLFlow on Heroku
How to deploy MLFlow on Heroku using PostgreSQL as the database, S3 as the artifact storage and with BasicAuth authentication
06 Aug 2021