I'm a bit new to Spark and Scala.I have a (large ~ 1million) Scala Spark DataFrame, and I need to make it a json String. the schema of the df like this

 |-- key: string (nullable = true)
 |-- value: string (nullable = true)

key is product id and, value is some produt set and it's score values that I get from a parquet file. I only manage to get something like this. For curly brackets I simply concatenate them to result.

3434343<tab>{smartphones:apple:0.4564879,smartphones:samsung:0.723643 }

But I expect a value like this.Each value should have a

3434343<tab>{"smartphones:apple":0.4564879, "smartphones:samsung":0.723643 }

are there anyway that I directly convert this into a Json string without concatenate anything. I hope to write output files into .csv format. This is code I'm using

 val df = parquetReaderDF.withColumn("key",col("productId"))         
       .withColumn("value", struct(
val df2 = df.withColumn("valKey", concat(

  .map{ r =>
    val key = r.getAs[String]("key")
    val value = r.getAs[Seq[String]]  ("collect_list(valKey)").mkString(",")

  .toDF("key", "valKey")
  .withColumn("valKey", concat(lit("{"), col("valKey"),  lit("}")))

       .option("delimiter", "\t")
       .option("header", "false")
       .option("quoteMode", "yes")

0 Answers