Generating TPCH-SF300 and SF1000 data

284 views Asked by At

I am trying to generate SF300 and SF1000 TPCH data on Databricks. However, my scripts have been running for over 24hrs now and I am guessing I did something wrong.

I followed the instructions the instructions on: https://github.com/databricks/spark-sql-perf. Then I used the notebook(tpcds_datagen.scala) in their repository to generate data. Of course, I modified the parameters to change TPC-DS to TPC-H. But it's extremely slow.

Could someone suggest a quicker way and help me out? Thanks in advance.

0

There are 0 answers