I have a data frame with start date and end date. I want to split that one row into multiple rows with date range in pre defined quarters. pre defined quarters(irrespective of year) are: Q1-Apr-Jun Q2-Jul-Sep Q3-Oct-Dec Q4-Jan-Mar
The row has to be split between the start and end date but split on the pre defined quarter months.
Input dataFrame:
| Pol_num | start_date | end_date |
|---|---|---|
| p1 | 2019-05-12 | 2020-05-11 |
| p2 | 2018-11-28 | 2019-07-29 |
The output I want is below:
| Pol_num | Quarter_start_date | Quarter_end_date | Quarter |
|---|---|---|---|
| p1 | 2019-05-12 | 2019-06-30 | Q1 |
| p1 | 2019-07-01 | 2019-09-30 | Q2 |
| p1 | 2019-10-01 | 2019-12-31 | Q3 |
| p1 | 2020-01-01 | 2020-03-31 | Q4 |
| p1 | 2020-04-01 | 2020-05-11 | Q1 |
| p2 | 2018-11-28 | 2018-12-31 | Q3 |
| p2 | 2019-01-01 | 2019-03-31 | Q4 |
| p2 | 2019-04-01 | 2019-06-30 | Q1 |
| p2 | 2019-07-01 | 2019-07-29 | Q2 |
Can anyone help with this?
One option could be to generate all dates with
date_rangethen toexplode, then post-process the output to compute the Quarter_start_date and the Quarter, and fix the Quarter_end_date:Output:
NB. you could also start by repeating the rows with:
Then compute the start/end/quarter by shifting the dates with increasing QuarterEnd. However, since the addition of QuarterEnd and the conversion of periods to number of periods are not vectorized, this probably won't give any benefit.