Specifying column with multiple datatypes in Spark Schema

768 views Asked by At

I am trying to create schema to parse json into spark dataframe

I have column value in json which could be either struct or string

"value": {
    "entity-type": "item",
    "id": "someid",
    "numeric-id": 30
  }

"value": "SomePicture.jpg",

How can i specify that in the schema

2

There are 2 answers

1
Ether On
{
  "type": ["object", "string"],
  "properties": { ... }
}

https://json-schema.org/understanding-json-schema/index.html

0
Neha Zaveri On

Solved it using below approach

In JSON, we can do the way you specified above. But while defining spark schema it doesn't work So for Spark schema I had to fetch value in String and then determine if value is going to be of structtype, based on certain conditions and then use from_json(value, new StructType()) to convert string back to JSON