I'm a bit new to Spark and Scala.I have a (large ~ 1million) Scala Spark DataFrame, and I need to make it a json String. the schema of the df like this

root
 |-- key: string (nullable = true)
 |-- value: string (nullable = true)
        |--valKey(String)
        |--vslScore(Double) 

key is product id and, value is some produt set and it's score values that I get from a parquet file. I only manage to get something like this. For curly brackets I simply concatenate them to result.

3434343<tab>{smartphones:apple:0.4564879,smartphones:samsung:0.723643 }

But I expect a value like this.Each value should have a

3434343<tab>{"smartphones:apple":0.4564879, "smartphones:samsung":0.723643 }

are there anyway that I directly convert this into a Json string without concatenate anything. I hope to write output files into .csv format. This is code I'm using

 val df = parquetReaderDF.withColumn("key",col("productId"))         
       .withColumn("value", struct(
         col("productType"),
         col("brand"),
         col("score")))
       .select("key","value")
val df2 = df.withColumn("valKey", concat(
  col("productType"),lit(":")
  ,col("brand"),lit(":"),

  col("score")))
  .groupBy("key")
  .agg(collect_list(col("valKey")))
  .map{ r =>
    val key = r.getAs[String]("key")
    val value = r.getAs[Seq[String]]  ("collect_list(valKey)").mkString(",")

    (key,value)
  }
  .toDF("key", "valKey")
  .withColumn("valKey", concat(lit("{"), col("valKey"),  lit("}")))


  df.coalesce(1)
       .write.mode(SaveMode.Overwrite)
       .format("com.databricks.spark.csv")
       .option("delimiter", "\t")
       .option("header", "false")
       .option("quoteMode", "yes")
       .save("data.csv")

0 Answers