Saving CSV file with partitionBy in Spark

Question

Saving CSV file with partitionBy in Spark

2.6k views Asked by Cheeko At 09 February 2016 at 02:40

I'm trying to save a dataframe as CSV file partitioned by a column.

val schema = new StructType(
      Array(
        StructField("ID",IntegerType,true),
        StructField("State",StringType,true),
        StructField("Age",IntegerType,true)
      )
)

val df = sqlContext.read.format("com.databricks.spark.csv")
        .options(Map("path" -> filePath).schema(schema).load()

df.write.partitionBy("State").format("com.databricks.spark.csv").save(outputPath)

But the output is not saved with any partition info. It looks like partitionBy was completely ignored. There were no errors. It works if I try the same with parquet format.

df.write.partitionBy("State").parquet(outputPath)

What am I missing here?

Original Q&A

There are 1 answers

**zero323** · Accepted Answer · 2016-02-09T02:59:42+00:00

zero323 On 09 February 2016 at 02:59 BEST ANSWER

partitionBy support has to be implemented as a part of a given data source and as for now (v1.3) is not supported in Spark CSV. See: https://github.com/databricks/spark-csv/issues/123

TechQA.

Saving CSV file with partitionBy in Spark

There are 1 answers

Related Questions in CSV

Related Questions in APACHE-SPARK

Related Questions in APACHE-SPARK-SQL

Related Questions in SPARK-CSV

Popular Questions

Popular Tags

Trending Questions