I have a table that looks like this:

temp = [['K98R', 'AB',34,'27-07-2010', '17-08-2013', '2008-03-01', '2011-05-02', 44],
['S33T','ES',55, '2009-07-23', '2012-03-12', '2010-09-17', '', 76]]
Data = pd.DataFrame(temp,columns=['ID','Initials','Age', 'Entry','Exit','Event1','Event','Weight'])

What you see in the table above, is that there is an entry and exit dates, with dates for the events 1 and 2, there is also a missing date for event 2 for the second patient because the event didn't happen. Also note that the event1 for the first patient happened before entry date.

What I am trying to achieve is a two fold: 1. Split the time between the entry and exit into years 2. Convert the wide format to long one with one row per year 3. Check if event 1 and 2 have occurred during the time period included in each row

To explain further, here is the output I am trying to ge.

ID    Initial   Age   Entry       Exit     Event1   Event2 Weight
K89R    AB       34 27/07/2010  31/12/2010  1       0       44
K89R    AB       35 1/01/2011   31/12/2011  1       1       44 
K89R    AB       36 1/01/2012   31/12/2012  1       1       44
K89R    AB       37 1/01/2013   17/08/2013  1       1       44
S33T    ES       55 23/07/2009  31/12/2009  0       0       76
S33T    ES       56 1/01/2010   31/12/2010  1       0       76
S33T    ES       57 1/01/2011   31/12/2011  1       0       76
S33T    ES       58 1/01/2012   12/03/2012  1       0       76

What you notice here is that the entry to exit date period is split into individual rows per patient, each representing a year. The event columns are now coded as 0 (meaning the event has not yet happened) or 1 (the event happened) which is then carried over to the years after because the event has already happened.

The age increases in every row per patient as time progresses

The patient ID and initial remain the same as well as the weight.

Could anyone please help with this, thank you

0 Answers