Career Coaching for Data Professionals
Building trustworthy data pipelines because AI cannot learn from dirty data
PySpark-Check - data quality validation for PySpark 3.0.0
Last week, I was testing whether we can use AWS Deequ for data quality validation. I ran into a few problems. First of all, it...
06 Jul 2020
The problem with software testing in data engineering
What if we found a bug in our data pipelines? What if that bug were easy to fix, but it would require a lot of...
15 Jun 2020
Data flow - what functional programming and Unix philosophy can teach us about data streaming
What does stream processing have in common with functional programming and Unix?
04 May 2020
Four books to boost your programmer career
I quit my dream job because of a book
06 Jan 2020
How to deploy a REST API AWS Lambda using Chalice and AWS Code Pipeline
How to create a REST API Endpoint using AWS Lambda, Chalice, and AWS Code Pipeline
16 Apr 2021
How to deploy a Tensorflow model using Sagemaker Endpoints and AWS Code Pipeline
How to build a Docker image using AWS Code Pipeline and deploy it as an Sagemaker Endpoint
09 Apr 2021
How to deal with days of the week in machine learning
How to encode week days as features for machine learning models
26 Mar 2021
On technical blogging
How to start blogging as a programmer
19 Mar 2021
Why do we use dropout in artificial neural networks?
How does dropout work in artificial neural networks?
12 Mar 2021
How to measure Spark performance and gather metrics about written data
How to track Spark metrics in AWS CloudWatch
05 Mar 2021
How to use AWS Batch to run a Python script
How to build a Docker image, define an AWS Batch job using Terraform, and run the AWS Batch job using Airflow
26 Feb 2021
Anomaly detection in Airflow DAG using Prophet library
How to detect problems in Airflow pipeline using Prophet for time series anomaly detection
12 Feb 2021
How to test REST API contract using BDD
Testing a REST API using Behave in Python
05 Feb 2021
Testing data products: BDD for data engineers
How to use BDD to test PySpark code
29 Jan 2021
Definition of done for data engineers
When can data engineers be sure that they have done the task?
14 Jan 2021
Don't learn another programming language
Should you learn a new programming language this year?
07 Jan 2021