How to handle escape characters in pyspark. Trying to replace escape character with NULL, when column value is '\026' in dataframe

Question

How to handle escape characters in pyspark. Trying to replace escape character with NULL, when column value is '\026' in dataframe

1.8k views Asked by EVR At 07 March 2022 at 06:44

How to handle escape characters in pyspark. Trying to replace escape character with NULL

'\026' is randomly spreadout through all the columns and I have replace to '\026' with NULL across all columns

below is my sample input data

col1,col2,col3,Col4    
1,\026\026,abcd026efg,1|\026\026|abcd026efg            
2,\026\026,\026\026\026,2|026\026|\026\026\026         
3,ad026eg,\026\026,3|ad026eg|\026\026       
4,ad026eg,xyad026,4|ad026eg|xyad026

and, my out data should be

col1|col2|col3|col4|      
1,NULL,abcd026efg,1||abcd026efg|   
2,NULL,NULL,2|NULL|NULL|   
3,ad026eg,NULL,3|ad026eg|NULL|       
4,ad026eg,xyad026,4|ad026eg|xyad026|

Note: Col4 is combined columns of col1, col2, col3 with | delimited

 df.withColumn('col2',F.regexp_replace('col2','\D\d+',None)).show().
 This is working but it is replacing all the cell values with NULL.

Original Q&A

There are 1 answers

**Chandra Babu** · Answer 1 · 2022-03-10T19:40:05+00:00

Try this if u want to do it in rdd:

rddd = df.rdd.map(
    lambda x: [re.sub(r"\\026", "", x[i].strip()) for i in range(len(x))]
).map(lambda x: [None if x[i] == "" else x[i].strip() for i in range(len(x))])

df2=rddd.toDF(["a","b","c","d"])

df2.show()

enter image description here

TechQA.

How to handle escape characters in pyspark. Trying to replace escape character with NULL, when column value is '\026' in dataframe

There are 1 answers

Related Questions in PYSPARK

Related Questions in APACHE-SPARK-SQL

Related Questions in STRINGESCAPEUTILS

Popular Questions

Trending Questions