mikulskibartosz.name
Consulting
Speaker
Bartosz Mikulski
Building trustworthy data pipelines because AI cannot learn from dirty data
Featured
PySpark-Check - data quality validation for PySpark 3.0.0
Last week, I was testing whether we can use AWS Deequ for data quality validation. I ran into a few problems. First of all, it...
The problem with software testing in data engineering
What if we found a bug in our data pipelines? What if that bug were easy to fix, but it would require a lot of...
Data flow - what functional programming and Unix philosophy can teach us about data streaming
What does stream processing have in common with functional programming and Unix?
Four books to boost your programmer career
I quit my dream job because of a book
All Stories
Definition of done for data engineers
When can data engineers be sure that they have done the task?
Don't learn another programming language
Should you learn a new programming language this year?
How to read from SQL table in PySpark using a query instead of specifying a table
Fetching data using a SQL query in PySpark
How to restart a stuck Airflow DAG
What to do when an Airflow DAG gets stuck and does not want to run
Why does the DayOfWeekSensor exist in Airflow?
How to make an Airflow DAG wait until a specified day of the week
Send SMS from an Airflow DAG using AWS SNS
How to configure SNS subscription to send SMS messages and use Airflow to send them
How to emulate temporary tables in Athena
Use CTAS to create a temporary table in Athena
How to enable S3 bucket versioning using Terraform
How to configure Define S3 bucket versioning in Terraform
How to get a notification when a new file is uploaded to an S3 bucket
Get a Slack notification when a file is uploaded to an S3 bucket
Get an XCom value in the Airflow on_failure_callback function
How to get the task instance in the on_failure_callback to get access to XCom
Add the row insertion time to a MySQL table
Automatically add the insertion and update time in MySQL
Best practices about partitioning data in S3 by date
How to partition data in S3 by date in a way that makes your life easier
« Prev
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Next »