How to combine two DataFrames with no common columns in Apache Spark

This article is a part of my "100 data engineering tutorials in 100 days" challenge. (74/100)

In this article, I will show you how to combine two Spark DataFrames that have no common columns.

For example, if we have the two following DataFrames:

1
2
3
4
5
6
7
8
val df1 = Seq(
        ("001","002","003"),
        ("004","005","006")
    ).toDF("A","B","C")
val df2 = Seq(
        ("011","022","033"),
        ("044","055","066")
    ).toDF("D","E","F")

The output I want to get looks like this:

1
2
3
4
5
6
7
8
+----+----+----+----+----+----+
|   A|   B|   C|   D|   E|   F|
+----+----+----+----+----+----+
| 001| 002| 003|null|null|null|
| 004| 005| 006|null|null|null|
|null|null|null| 011| 022| 033|
|null|null|null| 044| 055| 066|
+----+----+----+----+----+----+

This can be easily achieved by using the full outer join with the condition set to false:

1
df1.join(df2, lit(false), "full")

It works because the full outer join takes all rows from both DataFrames, so we end up with all rows, and we use lit(false) as the joining condition, which ensures that there will be no matches between both DataFrames.


Subscribe to the newsletter and join the free email course.


Remember to share on social media!
If you like this text, please share it on Facebook/Twitter/LinkedIn/Reddit or other social media.

If you want to contact me, send me a message on LinkedIn or Twitter.

Would you like to have a call and talk? Please schedule a meeting using this link.


Bartosz Mikulski
Bartosz Mikulski * MLOps Engineer / data engineer * conference speaker * co-founder of Software Craft Poznan & Poznan Scala User Group

Subscribe to the newsletter and get access to my free email course on building trustworthy data pipelines.