value from a past, potentially missing month in dataframe

51 views Asked by At

Assume I have a DataFrame like the following:

Month,   Gender, State, Value
2010-01, M,      S1,    10
2010-02, M,      S1,    20
2010-05, M,      S1,    26
2010-03, F,      S2,    11

I want to add another column for the given Gender and state from the previous month (or X months past) if it exists, i.e.:

Month,   Gender, State, Value, Last Value
2010-01, M,      S1,    10,    NaN
2010-02, M,      S1,    20,    10 
2010-05, M,      S1,    26,    NaN (there is no 2010-04 for M, S1)
2010-03, F,      S2,    11,    NaN

I know I have to groupby(['Gender', 'State']) but then shift() does not work as it just shifts data by number of rows, it is not aware of the period itself (if a month is missing).

1

There are 1 answers

0
gatapia On BEST ANSWER

I found a way of doing this, not too happy about it tho:

full_index = []
for g in all_genders:
  for s in all_states:
    for m in all_months:
      full_index.append((g, s, m))
df = df.set_index(['Gender', 'State', 'Month'])
df = df.reindex(full_index) # fill in all missing values

So basically, instead of dealing with missing rows in the data, lets just create the missing rows and the shift() works as expected.

I.e.:

df['Last Value'] = df.shift(1).Value
...
df = df.reset_index() # go back to tabular format from this hierarchy