How to get week of month in Spark 3.0+?

1.3k views Asked by At

I cannot find any datetime formatting pattern to get the week of month in spark 3.0+

As use of 'W' is deprecated, is there a solution to get week of month without using legacy option?

The below code doesn't work for spark 3.2.1

df = df.withColumn("weekofmonth", f.date_format(f.col("Date"), "W"))
4

There are 4 answers

3
Sudhin On BEST ANSWER

you can try using udf:

from pyspark.sql.functions import col,year,month,dayofmonth

df = spark.createDataFrame(
    [(1, "2022-04-22"), (2, "2022-05-12")], ("id", "date"))

from calendar import monthcalendar
def get_week_of_month(year, month, day):
    return next(
        (
            week_number
            for week_number, days_of_week in enumerate(monthcalendar(year, month), start=1)
            if day in days_of_week
        ),
        None,
    )
fn1 = udf(get_week_of_month)
df =df.withColumn('week_of_mon',fn1(year(col('date')),month(col('date')),dayofmonth(col('date'))))
display(df)

enter image description here

0
serge On

If you have table with year, month and week numbers sorted by year and week, you may try my solution:

select 
      year_iso,
      month,
      posexplode(collect_list(week_iso)) as (week_of_month, week_iso)
from your_table_with_dates
group by year_iso, month

Here we just transform column week_iso into array grouped by year_iso and month, and then explodes it backward into two columns (position inside month and week_iso).

Note, positions starts in 0, but its not a real problem.

Screenshots of tables:

0
ZygD On

For completeness, it's worth mentioning that one can set the configuration to "LEGACY".

spark.conf.set("spark.sql.legacy.timeParserPolicy", "LEGACY")
from pyspark.sql import functions as F

spark.conf.set("spark.sql.legacy.timeParserPolicy", "LEGACY")

df = spark.createDataFrame(
    [('2022-07-01',),
     ('2022-07-02',),
     ('2022-07-03',)],
    ['Date'])
df.withColumn("weekofmonth", F.date_format(F.col("Date"), "W")).show()
# +----------+-----------+
# |      Date|weekofmonth|
# +----------+-----------+
# |2022-07-01|          1|
# |2022-07-02|          1|
# |2022-07-03|          2|
# +----------+-----------+
0
d-to-the-k On

I came up with this:

df = df.withColumn(
    "week_of_month",
    F.weekofyear("my_date") - F.weekofyear(F.date_trunc("month", "my_date")) + 1
)

To obtain the week of the month, we take the week of the year for the intended date and we subtract the week of the year for the start of the month. Then we just add 1 to start counting from 1 rather than 0. Example below:

+----------+-------------+
|   my_date|week_of_month|
+----------+-------------+
|2023-12-01|            1|
|2023-12-02|            1|
|2023-12-03|            1|
|2023-12-04|            2|
+----------+-------------+