mikulskibartosz.name
Start here
About me
Twitter
Mastodon
Hire me
Bartosz Mikulski
Leveraging AI to drive growth and innovation
All Stories
Run a command on a remote server using SSH in Airflow
how to use the SSHHook in a PythonOperator to connect to a remote server from Airflow using SSH and execute a command.
Use the ROW_NUMBER() function to get top rows by partition in Hive
How to calculate row number by partition in Hive and use it to filter rows
How to configure both core and spot instances in EMR using Terraform
Use EMR instance group to add spot instances to an EMR cluster
How to temporarily disable an AWS Lambda function using AWS CLI without removing the function
Disable an AWS Lambda using AWS CLI
How to add an EMR step from AWS Lambda
How to configure a new EMR step using AWS Lambda in Python
Send event to AWS Lambda when a file is added to an S3 bucket
Trigger AWS Lambda when a file is created in an S3 bucket
Select Serverless configuration variables using the stage parameter
Use a custom function in Airflow templates
How to add a custom function to Airflow and use it in a template
Speed up counting the distinct elements in a Spark DataFrame
Use HyperLogLog to calculate the approximate number of distinct elements in Apache Spark
Pass parameters to SQL query when using PostgresOperator in Airflow
How to pass parameters to SQL template when using PostgresOperator in Airflow
Use regexp_replace to replace a matched string with a value of another column in PySpark
Use regex to replace the matched string with the content of another column in PySpark
How to read multiple Parquet files with different schemas in Apache Spark
What to do when Apache Spark skips Parquet files with incompatible schemas
« Prev
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
Next »