Spark Thrift server trying to load full dataset into memory before transmission via JDBC, on JDBC client I'm receiving error:
SQL Error: org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 48 tasks (XX GB) is bigger than spark.driver.maxResultSize (XX GB)
org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 48 tasks (XX GB) is bigger than spark.driver.maxResultSize (XX GB)
org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 48 tasks (XX GB) is bigger than spark.driver.maxResultSize (XX GB)
Query: select * from table. Is it possible enable something like stream mode for Thrift Server? The main goal - grant access from Pentaho ETL to Hadoop cluster using SparkSQL via JDBC connection. But if Thrift Server should load full dataset into memory before transmission this approach will not work.
I your situation increase the spark driver memory and max result size as spark.driver.memory=xG ,spark.driver.maxResultSize=xG. according to https://spark.apache.org/docs/latest/configuration.html