Empty data frame is inserting the data - PySpark Left Anti

22 views Asked by At

I have data frames when joined does not provide me the results. But when inserted, the data is getting inserted from first data frame though the join does not provide me the results.

df1 = spark.read \
    .format("jdbc") \
    .option("url", "jdbc:oracle:thin:@dsafsda:123/fd") \
    .option("query", """select * from employees""") \
    .load()


df2 = spark.read \
    .format("jdbc") \
    .option("url", "jdbc:postgresql:@das:132/wq") \
    .option("query", """select * from employees""") \
    .load()    
    
    
df_Extract = df1.join(df1, (df2["emp_id"] == df1["emp_id"]) & 
                         (df2["dept_id"] == df1["dept_id"]), "leftanti")

df_Extract.show()                                                         

df_Extract.write \
                    .mode("append") \
                    .format("jdbc") \
                    .option("url", "jdbc:postgresql:@das:132/wq") \
                    .option("dbtable", "employee_new") \
                    .save()    

job.commit()

df_Extract does not provide any data. df_Extract.show() also results on printing empty data frame.

But however, df_Extract.write inserts the data into the table from df1.

Looks wieried, any suggestion would be appreciated

0

There are 0 answers