I have a requirement if a matching df2 has a null value it should match. Spark doesn't match rows with null by default.
df1:
ID Name City EMAIL
1 John City A [email protected]
2 Mist City B [email protected]
3 Danny City C [email protected]
df2:
ID Name City EMAIL
1 John City A [email protected]
2 null City B [email protected]
3 Danny City C [email protected]
df3 = df1.join(df2, on=["ID", "NAME", "CITY"])
display(df3)
Spark Output:
ID Name City EMAIL EMAIL
1 John City A [email protected] [email protected]
3 Danny City C [email protected] [email protected]
Expected Output:
ID Name City EMAIL EMAIL
1 John City A [email protected] [email protected]
2 Mist City B [email protected] [email protected]
3 Danny City C [email protected] [email protected]
As shown above, since ID and CITY match and NAME have null values, the join should match and give the expected result.
And I can't remove NAME on the joining column it should match NAME, just if NAME is null then those columns also should match.
Kindly help.
Try this:
Output: