Spark PySpark StructType StructField json to avro

346 views Asked by At

I am trying to read json file and write it do avro. I used PySpark StructType & StructField classes to programmatically specify the schema to the DataFrame. I am trying to read json file and write it to avro format with logicalType set to "timestamp-micros".

The schema is set properly with StructField but input data is corrupted while it again tries to convert to epoch format. The intention is just to read json and write it to json without any transformation in date field with logicalType set.

My Json Sample (filename: test.json)

[{
"name": "test1",
"Date_of_Sale": 1528955439000
}]
>>> myschema = StructType([StructField("Date_of_Sale", TimestampType(), True)])

>>> df_bi = spark.read.json('/tmp/test.json', myschema, multiLine=True).na.drop("all")

>>> df_bi.show()
+--------------------+
|     Date_of_Sale.  |
+--------------------+
|50420-09-01 14:10...|
+--------------------+

df_bi.write.format("avro").save("test")

Schema from Write:

{
  "type" : "record",
  "name" : "topLevelRecord",
  "fields" : [ {
    "name" : "Date_of_Sale",
    "type" : [ {
      "type" : "long",
      "logicalType" : "timestamp-micros"
    }, "null" ]
  } ]
}

Data from Avro as json ouput:
{"Date_of_Sale":{"long":1528955439000000000}}

Any suggestion is appreciated !

0

There are 0 answers