I have a DataFrame and a list of columns as strings, and I need to add on a column to the original dataframe that is a | separated list of the values, where null gets turned into the string "null"

This is the code I currently have:

def mergeColumns(cols: Array[String], df: DataFrame, newName: String): DataFrame = {
  val dfColumns = cols.map( columnName => df(columnName) )
  df.withColumn(newName, concatenate(array(dfColumns: _*), lit("|")))
}
val concatenate: UserDefinedFunction = udf( (columns: Seq[Any], separator: String) => {
  columns.map({
    case null => "null"
    case default => default.toString
  }).mkString(separator)
})

And that code works, but only if all the provided columns are of the same type, and my current plan is to try to convert them all to String type. So I have two questions:
1) Is there a better way to do this in general?
2) If not, how can I convert all the dfColumns columns into strings?

2 Answers

3
Raphael Roth On Best Solutions

You can do this using mkString on the Row class :

val df =Seq(
  ("a", "b",Option.empty[String])
).toDF("Col1","Col2","Col3")

val makeString = udf((r : Row) => r.mkString("|"))

df.withColumn("newCol",makeString(struct("*")))
  .show()

gives

+----+----+----+--------+
|Col1|Col2|Col3|  newCol|
+----+----+----+--------+
|   a|   b|null|a|b|null|
+----+----+----+--------+
0
Shu On

Try with fill null values and concat_ws functions.

EX:

val df =Seq(
           ("a", "b",Option.empty[String]))
           .toDF("Col1","Col2","Col3")
df.na.fill("null")
     .withColumn("cn_ws",concat_ws("|", array("*")))
     .show(false)

Result:

+----+----+----+--------+
|Col1|Col2|Col3|cn_ws   |
+----+----+----+--------+
|a   |b   |null|a|b|null|
+----+----+----+--------+

UPDATE:

Cast all columns to "string"

df.na.fill("null")
     .withColumn("cn_ws", 
       concat_ws("|", array(df.columns.map(c => col(c).cast(StringType)): _*)))
     .show(false)