How can I round up or down a Spark DataFrame with a column consisting of hours and minutes so that I get whole hours instead of times such as "22:34"? And it should be done with PySpark API.

I already tried to convert the time as timestamp and afterwards to calculate the new time with unix_timestamp but unfortunately it does not work properly. For example: "13:00 - 23:59" results in null values. But for values between "00:00 - 11:59" it works. The value "03:34" results in "4".

s_df = s_df.withColumn("hour2",
hour((round(unix_timestamp("Hour3")/3600)*3600).cast("Timestamp")))

The structure of the rounded values should be as following:

Original: "14:22" Result after rounding: "14:00" Original: "00:34" Result after rouding: "01:00"

0 Answers