I have a column 'calendar_date' which contains different dates from 2010 to 2024. And I have to filter between dates from a given date. For example I have the date '2023-09-15' and i have to filter the 7 days before from that day on the calendar_date column. I tried to use date_sub but it did not work. I tried something like this but it didnt work.
spark = SparkSession.builder.appName("DateFilterExample").getOrCreate()
data = [("2023-09-01",), ("2023-09-02",), ("2023-09-03",), ("2023-09-04",), ("2023-09-05",),
("2023-09-06",), ("2023-09-07",), ("2023-09-08",), ("2023-09-09",), ("2023-09-10",),
("2023-09-11",), ("2023-09-12",), ("2023-09-13",), ("2023-09-14",), ("2023-09-15",),
("2023-09-16",), ("2023-09-17",), ("2023-09-18",), ("2023-09-19",), ("2023-09-20",)]
df = spark.createDataFrame(data, ["date_string"])
given_date = date_sub(col('date_string'), 0)
start_date = date_sub(given_date, 8)
end_date = date_sub(given_date, 1)
# Filter the DataFrame
filter_df = df.filter(
(col("date_string") >= start_date) &
(col("date_string") <= end_date)
).where(col('date_string') <= '2023-09-15' )
filter_df.show()
prints nothing
+-----------+----+
|date_string|date|
+-----------+----+
+-----------+----+
I solve it, as date sub only takes strings and column as parameters, when you try to filter by certain date it doesnt find any coincidence, so you have to use "literal Value" to be able to understand. like this