How to avoid being struct column name written to the json file?

Question

How to avoid being struct column name written to the json file?

51 views Asked by Dinesh Vashisth At 28 March 2024 at 09:25

How to avoid being struct column name written to a json file? While writing the df to the json file?

Using databricks pyspark write method.

Df.write.option("header", "false").mode("overwrite).json(path)

Tried option("header", "false")

Sample json file: {"struct_col_name":{"actual_struct_data_col":"values"....}}

Need to avoid first root key column struct_col_name.

Sample dataframe/ schema Sample dataframe picture

PrintSchema picture

Original Q&A

There are 3 answers

**ShaikMaheer** · Answer 1 · 2024-03-28T10:24:03+00:00

ShaikMaheer On 28 March 2024 at 10:24

You cannot do that directly from dataframe.

You can consider, write json first and then having python code to read that json file and replace your column names with empty string.

**Bhavani** · Answer 2 · 2024-03-28T10:54:43+00:00

You can follow the procedure below to get the required format:

Here is a sample JSON, which is the JSON format of a data frame:

{"id":2,"name":"Alice","properties":{"age":"25","gender":"Female"}} 
{"id":1,"name":"John","properties":{"age":"30","gender":"Male"}}

Read the JSON and flatten its structure in your Spark Data Frame, use the select function along with the alias function to rename the properties as required. Use the following code:

from pyspark.sql import SparkSession
from pyspark.sql.functions import col, struct
json_df = spark.read.json("<jsonPath>/data.json")

flattened_df = json_df.select(
    col("id"),
    col("name"),
    col("properties.age").alias("age"),
    col("properties.gender").alias("gender")
)

To see the flattened JSON, use the code below:

json_string = flattened_df.toJSON().collect()

# Print JSON string
for row in json_string:
    print(row)

Output:

{"id":2,"name":"Alice","age":"25","gender":"Female"} 
{"id":1,"name":"John","age":"30","gender":"Male"}

Write the DataFrame into JSON format using the following code:

flattened_df.write.option("header", "false").mode("overwrite").json("<jsonPath>/data.json")

**Vikas Sharma** · Answer 3 · 2024-03-28T21:51:00+00:00

Try this:

df_with_struct.select("final_struct_df.*").write.mode("overwrite").json(path)

Note: I am guessing you are just trying to write a flattened version of the dataframe. In that case, the aforementioned piece of code should work. If not, please let me know your required output by updating your question or in the comments to this answer.

TechQA.

How to avoid being struct column name written to the json file?

There are 3 answers

Related Questions in APACHE-SPARK

Related Questions in PYSPARK

Related Questions in DATABRICKS

Related Questions in AZURE-DATABRICKS

Popular Questions

Trending Questions