mikulskibartosz.name
Start here
About me
Twitter
Mastodon
Hire me
Bartosz Mikulski
Leveraging AI to drive growth and innovation
All Stories
How to concatenate columns in a PySpark DataFrame
How to use the concat and concat_ws functions to merge multiple columns into one in PySpark
How to derive multiple columns from a single column in a PySpark DataFrame
Extract multiple columns from a single column using the withColumn function and a PySpark UDF
Broadcast variables and broadcast joins in Apache Spark
How to speed up joins of small DataFrames by using the broadcast join
How to use the window function to get a single row from each group in Apache Spark
How to group values by a key and extract a single row from each group in Apache Spark
How to make a pivot table in AWS Athena or PrestoSQL
How to make a pivot table in AWS Athena, and why the pivot function does not exist
What is the difference between repartition and coalesce in Apache Spark?
When should you use coalesce instead of repartition in Apache Spark
How to pivot an Apache Spark DataFrame
How to turn an Apache Spark or PySpark DataFrame into a pivot table.
What is the difference between cache and persist in Apache Spark?
When should you use the cache, and when you should use the persist function
Why your company should use PrestoSQL
Should your team use PrestoSQL?
Is counting rows all we can do?
How to detect problems in data pipelines before they turn into hard to debug bugs? I wish I knew.
How to Speed Up AWS Athena Queries Using Partition Projection
How to define partition projection while creating an Athena table
How to send a customized Slack notification when an Airflow task fails
How to customize a Slack notification before sending it to the Slack incoming webhook.
« Prev
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
Next »