Spark Thrift server load full dataset into memory before transmission via JDBC

831 views Asked by At

Spark Thrift server trying to load full dataset into memory before transmission via JDBC, on JDBC client I'm receiving error:

SQL Error: org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 48 tasks (XX GB) is bigger than spark.driver.maxResultSize (XX GB)
  org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 48 tasks (XX GB) is bigger than spark.driver.maxResultSize (XX GB)
  org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 48 tasks (XX GB) is bigger than spark.driver.maxResultSize (XX GB)

Query: select * from table. Is it possible enable something like stream mode for Thrift Server? The main goal - grant access from Pentaho ETL to Hadoop cluster using SparkSQL via JDBC connection. But if Thrift Server should load full dataset into memory before transmission this approach will not work.

2

There are 2 answers

0
Sanjai Verma On

I your situation increase the spark driver memory and max result size as spark.driver.memory=xG ,spark.driver.maxResultSize=xG. according to https://spark.apache.org/docs/latest/configuration.html

1
Triffids On

Solution: spark.sql.thriftServer.incrementalCollect=true