val value:String = "\u0001"+ "V1" + "\u0002"
val df = Seq((value)).toDF("f1")
df.show
Now df is having proper value for field f1. But while writing using spark in build csv format with below code, the ^A, ^B characters are not showing in output.
df.write.format("csv").option("delimiter", "\t").option("codec", "bzip2").save("temp_out")
Here the temp_out output doesnot show any ^A, ^B chraracter for field f1
Looking forward some suggestions.
If Spark's save operation is dropping certain characters, you'll notice that when you open the CSV file(s), those bytes are missing. First, take a look at the bytes in
value
:saveAsTextFile
has been around for a while, and is a bit more straightforward. If you can't get the CSV option to work, this is a good workaround.You'll probably still be able to read the file using the
csv
method from the reader, without any dropped characters, as below (but you'll want to confirm with your specific setup):