Delete Unicode value in output of Spark 1.6 using Scala

Question

Delete Unicode value in output of Spark 1.6 using Scala

120 views Asked by Sophie Dinka At 20 September 2019 at 14:45

The file generated from API contains data like below

col1,col2,col3
503004,(d$üíõ$F|'.h*Ë!øì=(.î;      ,.¡|®!®3-2-704

when i am reading in spark it is appearing like this. i am using case class to read from RDD then convert it to DataFrame using .todf.

503004,������������,������������������������3-2-704

but i am trying to get value like

503004,dFh,3-2-704-- only alpha-numeric value is retained.

i am using spark 1.6 and scala.

Please share your suggestion

Original Q&A

There are 1 answers

**sangam.gavini** · Accepted Answer · 2019-09-26T15:42:33+00:00

#this ca be achieved by using the regex_replace
    val df = spark.sparkContext.parallelize(List(("503004","d$üíõ$F|'.h*Ë!øì=(.î;      ,.¡|®!®","3-2-704"))).toDF("col1","col2","col3")
    df.withColumn("col2_new", regexp_replace($"col2", "[^a-zA-Z]", "")).show()    
Output:
+------+--------------------+-------+--------+
|  col1|                col2|   col3|col2_new|
+------+--------------------+-------+--------+
|503004|d$üíõ$F|'.h*Ë!øì=...|3-2-704|     dFh|
+------+--------------------+-------+--------+

TechQA.

Delete Unicode value in output of Spark 1.6 using Scala

There are 1 answers

Related Questions in SCALA

Related Questions in APACHE-SPARK

Related Questions in APACHE-SPARK-1.6

Popular Questions

Trending Questions