createOrReplaceTempView does not work on empty dataframe in pyspark2.0.0

2.2k views Asked by At

I am trying to define a sql view on a pyspark dataframe(2.0.0) and getting errors like "Table or View Not found". What I am doing : 1. Create an empty dataframe 2. load data from different location into a temp dataframe 3. append the temp data frame to a main dataframe (the empty one) 4. define a sql view on the dataframe(which was empty earlier).

spark = SparkSession.builder.config(conf=SparkConf()).appName("mydailyjob").getOrCreate()
sc = spark.sparkContext

schema = StructType([StructField('vdna_id', StringType(), True),
StructField('miq_id', LongType(), True),
StructField('tags', IntegerType(), True),
StructField('dateserial', DateType(), True),
StructField('date_time', TimestampType(), True),
StructField('survey_id', StringType(), True),
StructField('ip', StringType(), True)])
brandsurvey_feed = sqlContext.createDataFrame(sc.emptyRDD(), schema)

# load brandsurvey feed data for each date in date_list
for loc in all_loc:
    # load file from different location
    bs_tmp = spark.read.csv(loc, schema=schema, sep='\t', header=True)
    brandsurvey_feed = brandsurvey_feed.union(bs_tmp)

brandsurvey_feed.createOrReplaceTempView("brandsurvey_feed")
print(spark.sql("select * from brandsurvey_feed").show())
1

There are 1 answers

0
braj On

Folks, i think I found the reason. If we create a sql view on a dataframe with zero records and then access the table you will get an eerror "table or view does not exists". I would suggest keep a check before you define any sql view on the dataframe that it is not empty