In my data frame I have timestamp column with values in format: "%Y-%m-%d %H:%M:%m". I want to save this Data Frame to parquet using:
df.write.partitionBy('p_timestamp').mode("overwrite").parquet('output/proto.parquet')
and I want to have timestamp partition saved in AWS data lake s3 timestamp format: "%Y-%m-%dT%H-%M"
I tried:
df.select(F.col("p_timestamp"), F.date_format(F.col("p_timestamp"), "%Y-%m-%dT%H-%M").alias("date_format")).show()
but I got an error:
Illegal pattern character 'T'
I would be grateful for any tips.
I think the formatting string you are looking for is:
You get the error because the
T
is not quoted with'
to mark it as a string. You can consult sparks Datetime Patterns for Formatting and Parsing if you need something a little different.Small example:
Output: