mikulskibartosz.name
Career Coaching for Data Professionals
Speaker
Bartosz Mikulski
Building trustworthy data pipelines because AI cannot learn from dirty data
Featured
PySpark-Check - data quality validation for PySpark 3.0.0
Last week, I was testing whether we can use AWS Deequ for data quality validation. I ran into a few problems. First of all, it...
The problem with software testing in data engineering
What if we found a bug in our data pipelines? What if that bug were easy to fix, but it would require a lot of...
Data flow - what functional programming and Unix philosophy can teach us about data streaming
What does stream processing have in common with functional programming and Unix?
Four books to boost your programmer career
I quit my dream job because of a book
All Stories
How to use AWS Batch to run a Python script
How to build a Docker image, define an AWS Batch job using Terraform, and run the AWS Batch job using Airflow
Anomaly detection in Airflow DAG using Prophet library
How to detect problems in Airflow pipeline using Prophet for time series anomaly detection
How to test REST API contract using BDD
Testing a REST API using Behave in Python
Testing data products: BDD for data engineers
How to use BDD to test PySpark code
Definition of done for data engineers
When can data engineers be sure that they have done the task?
Don't learn another programming language
Should you learn a new programming language this year?
How to read from SQL table in PySpark using a query instead of specifying a table
Fetching data using a SQL query in PySpark
How to restart a stuck Airflow DAG
What to do when an Airflow DAG gets stuck and does not want to run
Why does the DayOfWeekSensor exist in Airflow?
How to make an Airflow DAG wait until a specified day of the week
Send SMS from an Airflow DAG using AWS SNS
How to configure SNS subscription to send SMS messages and use Airflow to send them
How to emulate temporary tables in Athena
Use CTAS to create a temporary table in Athena
How to enable S3 bucket versioning using Terraform
How to configure Define S3 bucket versioning in Terraform
« Prev
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Next »