Splitting one row with date field into multiple rows with specified quarter dates

Question

Splitting one row with date field into multiple rows with specified quarter dates

33 views Asked by ROma At 07 February 2024 at 07:30

I have a data frame with start date and end date. I want to split that one row into multiple rows with date range in pre defined quarters. pre defined quarters(irrespective of year) are: Q1-Apr-Jun Q2-Jul-Sep Q3-Oct-Dec Q4-Jan-Mar

The row has to be split between the start and end date but split on the pre defined quarter months.

Input dataFrame:

Pol_num	start_date	end_date
p1	2019-05-12	2020-05-11
p2	2018-11-28	2019-07-29

The output I want is below:

Pol_num	Quarter_start_date	Quarter_end_date	Quarter
p1	2019-05-12	2019-06-30	Q1
p1	2019-07-01	2019-09-30	Q2
p1	2019-10-01	2019-12-31	Q3
p1	2020-01-01	2020-03-31	Q4
p1	2020-04-01	2020-05-11	Q1
p2	2018-11-28	2018-12-31	Q3
p2	2019-01-01	2019-03-31	Q4
p2	2019-04-01	2019-06-30	Q1
p2	2019-07-01	2019-07-29	Q2

Can anyone help with this?

Original Q&A

There are 1 answers

**mozway** · Answer 1 · 2024-02-07T07:47:21+00:00

One option could be to generate all dates with date_range then to explode, then post-process the output to compute the Quarter_start_date and the Quarter, and fix the Quarter_end_date:

# ensure datetime
df[['start_date', 'end_date']] = (df[['start_date', 'end_date']]
                                  .apply(pd.to_datetime)
                                  )

out = (
 df.assign(Quarter_end_date=[pd.date_range(start, end+pd.offsets.QuarterEnd(0),
                                           freq='Q')
                             for start, end in zip(df['start_date'],
                                                   df['end_date'])])
   .explode('Quarter_end_date')
   .assign(Quarter_start_date=lambda d: d['Quarter_end_date']
           .groupby(level=0).shift()
           .add(pd.Timedelta('1d'))
           .fillna(d['start_date']),
           Quarter_end_date=lambda d: d['Quarter_end_date']
           .where(d.index.duplicated(keep='last'), d['end_date']),
           Quarter=lambda d: 'Q'+d['Quarter_end_date'].dt.quarter.astype(str)
          )
    [['Pol_num', 'Quarter_start_date', 'Quarter_end_date', 'Quarter']]
)

Output:

  Pol_num Quarter_start_date Quarter_end_date Quarter
0      p1         2019-05-12       2019-06-30      Q2
0      p1         2019-07-01       2019-09-30      Q3
0      p1         2019-10-01       2019-12-31      Q4
0      p1         2020-01-01       2020-03-31      Q1
0      p1         2020-04-01       2020-05-11      Q2
1      p2         2018-11-28       2018-12-31      Q4
1      p2         2019-01-01       2019-03-31      Q1
1      p2         2019-04-01       2019-06-30      Q2
1      p2         2019-07-01       2019-07-29      Q3

NB. you could also start by repeating the rows with:

n = (df['end_date'].dt.to_period('Q')
     .sub(df['start_date'].dt.to_period('Q'))
     .apply(lambda x: x.n).add(1)
    )

out = df.loc[df.index.repeat(n)]

Then compute the start/end/quarter by shifting the dates with increasing QuarterEnd. However, since the addition of QuarterEnd and the conversion of periods to number of periods are not vectorized, this probably won't give any benefit.

TechQA.

Splitting one row with date field into multiple rows with specified quarter dates

There are 1 answers

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in DATETIME

Related Questions in CALENDAR

Related Questions in DATE-RANGE

Popular Questions

Trending Questions