I help data engineering tech leads #makeDataTrustworthy because AI cannot learn from dirty data
Dependencies between DAGs: How to wait until another DAG finishes in Airflow?
In this article, I am going to show how to set up dependencies between two DAGs. Imagine that I have a DAG that dumps data from production databases and another...
17 Jul 2019
How to run Airflow in Docker (with a persistent database)
In this blog post, I am going to show you how to prepare the minimalist setup of puckel/docker-airflow Docker image that will run a single DAG and store logs persistently...
15 Jul 2019
Using machine learning for software testing
How to sample production data to get representative testing dataset?
12 Jul 2019
How to measure the similarity of sequence values
Levenshtein distance and Kendall tau distance
10 Jul 2019
Measuring document similarity in machine learning
How to measure the similarity of two datasets?
08 Jul 2019
Minkowski distance explained
05 Jul 2019
Why most data science projects fail?
03 Jul 2019
Product/market fit - buidling a data-driven product
How to test a product idea?
30 Jun 2019
How to assign people to groups in a fair way using genetic algorithms
Using Helisa and Jenetics in Scala
21 Jun 2019
Genetic algorithms in Scala - solving optimization problems
Using Helisa and Jenetics to help Fallout players
19 Jun 2019
Re: DataOps Principles: How Startups Do Data The Right Way
Team vs. a bunch of individuals reporting work time in the same spreadsheet
17 Jun 2019
From Scala to Python - Python dataclasses
Domain model in Python
14 Jun 2019