Actually, I need to iterate through each column of Pyspark dataframe and check for each rows. If I have date column with any type of date format, I need to identify it as date and its format. So I'm trying to use UDF with dateutil parser. But, the code is not going into the function.
def parse_date_string(date_str):
try:
return parser.parse(date_str).date()
except ValueError:
return None
def functiontofinddatatype():
for column in df.columns:
date_parse_udf = udf(parse_date_string, DateType())
df=df.withColumn("parsed_date", date_parse_udf(col(column)))
Please correct if the given code is wrong.
The parse_date_string function is not working. I mean it is not getting called by udf.