Initial data is in Dataset<Row> and I am trying to write to pipe delimited file and I want each non empty cell and non null values to be placed in quotes. Empty or null values should not contain quotes
result.coalesce(1).write()
.option("delimiter", "|")
.option("header", "true")
.option("nullValue", "")
.option("quoteAll", "false")
.csv(Location);
Expected output:
"London"||"UK"
"Delhi"|"India"
"Moscow"|"Russia"
Current Output:
London||UK
Delhi|India
Moscow|Russia
If I change the "quoteAll" to "true", output I am getting is:
"London"|""|"UK"
"Delhi"|"India"
"Moscow"|"Russia"
Spark version is 2.3 and java version is java 8
Java answer. CSV escape is not just adding " symbols around. You should handle " inside strings. So let's use StringEscapeUtils and define UDF that will call it. Then just apply the UDF to each of the column.
Side note: coalesce(1) is a bad call. It collect all data on one executor. You can get executor OOM in production for huge dataset.