I am trying to generate SF300 and SF1000 TPCH data on Databricks. However, my scripts have been running for over 24hrs now and I am guessing I did something wrong.
I followed the instructions the instructions on: https://github.com/databricks/spark-sql-perf. Then I used the notebook(tpcds_datagen.scala) in their repository to generate data. Of course, I modified the parameters to change TPC-DS to TPC-H. But it's extremely slow.
Could someone suggest a quicker way and help me out? Thanks in advance.