I am reading data from Cassandra as :
df = spark.read\
.format("org.apache.spark.sql.cassandra")\
.options(**configs)\
.options(table=tablename, keyspace=keyspace)\
.option("ssl", True)\
.option("sslmode", "require")\
.load()
Now this df is pyspark dataframe. I able to perform show(), printSchema() function on this df but when I am printing
df.count()
it's throwing error:
An error was encountered:
An error occurred while calling o1394.count.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 19 in stage
48.0 failed 4 times, most recent failure: Lost task 19.3 in stage 48.0 (TID 2053, js-
56258-63801-i-32-w-1.net, executor 9): java.lang.IllegalArgumentException:
requirement failed: Column not found in Java driver Row: count
How I can resolve this issue? Thanks in advance
I'm assuming it's not failing at the same stage all of the time. If that's the case, then you can try tuning the read/write parameters:
https://github.com/datastax/spark-cassandra-connector/blob/b2.4/doc/reference.md#read-tuning-parameters
https://github.com/datastax/spark-cassandra-connector/blob/b2.4/doc/reference.md#write-tuning-parameters
When you start pyspark, you'll need to pass the
--conf spark.cassandra.<option>in.