Filtering erroneous record from spark structured streaming dataframe while writing to cassandra

Question

Filtering erroneous record from spark structured streaming dataframe while writing to cassandra

47 views Asked by Aman Kumar At 30 October 2023 at 10:10

I know that there is some issue in nth row of my spark Scala dataframe(let's say data type is not proper). When I try to write this dataframe in cassandra using spark structured streaming, it fails and whole process stops there. Now, I want that in case of such scenario, erroneous record should get filtered and inserted to some other db and write to cassandra continues for rest of the records. Need to do this because until we identify and remove erroneous record, process doesn't move forward and it creates huge lag in Kafka producer. Is it possible to identify and filter such record as I am not able to find any such solution on internet.

Thanks,

Not getting anything useful to try.

Original Q&A

There are 1 answers

**Aman Kumar** · Answer 1 · 2024-01-07T15:26:05+00:00

Aman Kumar On 07 January 2024 at 15:26

Found the solution for this. We can use foreachPartitions with Cassandra connection to process each record faster and seperate out the erroneous record.

TechQA.

Filtering erroneous record from spark structured streaming dataframe while writing to cassandra

There are 1 answers

Related Questions in DATAFRAME

Related Questions in APACHE-SPARK

Related Questions in SPARK-STREAMING

Related Questions in SPARK-STRUCTURED-STREAMING

Related Questions in SPARK-CASSANDRA-CONNECTOR

Popular Questions

Trending Questions