Building trustworthy data pipelines because AI cannot learn from dirty data
Why do we use dropout in artificial neural networks?
How does dropout work in artificial neural networks?
12 Mar 2021
How to measure Spark performance and gather metrics about written data
How to track Spark metrics in AWS CloudWatch
05 Mar 2021
How to use AWS Batch to run a Python script
How to build a Docker image, define an AWS Batch job using Terraform, and run the AWS Batch job using Airflow
26 Feb 2021
Anomaly detection in Airflow DAG using Prophet library
How to detect problems in Airflow pipeline using Prophet for time series anomaly detection
12 Feb 2021
How to test REST API contract using BDD
Testing a REST API using Behave in Python
05 Feb 2021
Testing data products: BDD for data engineers
How to use BDD to test PySpark code
29 Jan 2021
Definition of done for data engineers
When can data engineers be sure that they have done the task?
14 Jan 2021
Don't learn another programming language
Should you learn a new programming language this year?
07 Jan 2021
How to read from SQL table in PySpark using a query instead of specifying a table
Fetching data using a SQL query in PySpark
01 Jan 2021
How to restart a stuck Airflow DAG
What to do when an Airflow DAG gets stuck and does not want to run
31 Dec 2020
Why does the DayOfWeekSensor exist in Airflow?
How to make an Airflow DAG wait until a specified day of the week
30 Dec 2020
Send SMS from an Airflow DAG using AWS SNS
How to configure SNS subscription to send SMS messages and use Airflow to send them
29 Dec 2020