I have large data files deliminated by ASCII character æ (Hex E6). My code snipped for parsing the file is as follows ,but seems the parser does not slit values properly (I use Spark 2.4.1)
implicit class DataFrameReadImplicits (dataFrameReader: DataFrameReader) {
def readTeradataCSV(schema: StructType, path: String) : DataFrame = {
dataFrameReader.option("delimiter", "\u00E6")
.option("header", "false")
.option("inferSchema", "false")
.option("multiLine","true")
.option("encoding", "UTF-8")
.schema(schema)
.csv(path)
}
}
Sample file : https://gist.github.com/ashikaumanga/c2161eee07da9b10052a4e53bc4c567e
Any tips how to fix this?
