Convert timestamp in pyspark data frame into Jalali date in Python

161 views Asked by At

I have a pyspark data frame that I am going to convert one of its column( which is in timestamp ) into Jalali date.

My data frame:

Name CreationDate
Sara 2022-01-02 10:49:43
Mina 2021-01-02 12:30:21

I want the following result:

Name CreationDate
Sara 1400-10-12 10:49:43
Mina 1399-10-13 12:30:21

I try the following code, but It does not work, I cannot find a way to convert the date and time:

df_etl_test_piko1.select(jdatetime.datetime.col('creationdate').strftime("%a, %d %b %Y %H:%M:%S"))
1

There are 1 answers

0
blackbishop On

You need to define UDF like this:

import jdatetime
from pyspark.sql import functions as F

@F.udf(StringType())
def to_jalali(ts):
    jts = jdatetime.datetime.fromgregorian(datetime=ts)
    return jts.strftime("%a, %d %b %Y %H:%M:%S")

Then applying to your example:

df = spark.createDataFrame([("Sara", "2022-01-02 10:49:43"), ("Mina", "2021-01-02 12:30:21")], ["Name", "CreationDate"])

# cast column CreationDate into timestamp type of not already done
# df = df.withColumn("CreationDate", F.to_timestamp("CreationDate"))

df = df.withColumn("CreationDate", to_jalali("CreationDate"))

df.show(truncate=False)
#+----+-------------------------+
#|Name|CreationDate             |
#+----+-------------------------+
#|Sara|Sun, 12 Dey 1400 10:49:43|
#|Mina|Sat, 13 Dey 1399 12:30:21|
#+----+-------------------------+