Which method is more memory efficient createOrReplaceView or saveAsTable

274 views Asked by At

I've a dataframe from hive table I'm doing some changes to it, then while saving it again in hive as a new table which method should I use ? Assume this dataframe has 70 million record, I want to make saving process memory & time efficient.

For eg.
Dataframe name = df

  1. df.createOrReplaceView(new_table)
    SQL("create table new_table as select * from new_table)

  2. df.write.saveAsTable("new_table")

1

There are 1 answers

0
scr On

The way I see it there's no way operation 1 can be more efficient. createOrReplaceView is creating a temporary table in memory, you can read about it in this previous question.

As such between (1) Reading from disk to create a temp table in memory, to write the same table to disk, and (2) Reading from disk to write to disk, number 2 seems the obvious favorite.

If this answer doesn't satisfy you. You can always try both ways and check the total time and memorySeconds consumed in the YARN application UI.