Groupby and ffill specified columns in Python

Question

Groupby and ffill specified columns in Python

80 views Asked by nilsinelabore At 13 October 2020 at 03:32

I want to sort values by id_, Code, Timestamp (since time order matters), then groupby d1 using id_ and Code, then forward fill using ffill for NaN for each group, on columns V1 and V2 only, while keeping other columns constant, and return the full table.

d1:


    Type_x  id_             Timestamp               V1   Code Type_y    V2
0   abcd    39-38-30-34     2012-09-20 23:46:05.870 35.5    2    NaN    0
1   abcd    39-38-30-34     2012-09-20 23:46:23.870 44.5    0    NaN    1
2   abcd    39-38-30-34     2012-09-20 23:48:07.870 43.5    0    NaN    1
3   abcd    39-38-30-34     2012-09-20 23:49:48.870 42.5    0    NaN    NaN
4   abcd    39-38-30-34     2012-09-20 23:50:44.870 34.5    2    NaN    NaN

Tried:

d2 = d1.sort_values(by = ['id_', 'Code', 'Timestamp']).groupby(['id_', 'Code'])['V1', 'V2'].ffill()

which only returned two columns:

        V1      V2
69659   21.5    NaN
300886  21.5    1.0
300887  21.5    0.0
70086   23.0    0.0
300955  23.0    1.0

How should I do it properly?

Original Q&A

There are 2 answers

wwnde On 13 October 2020 at 03:56

What did you need returned?

d2 = d1.sort_values(by = ['id_', 'Code', 'Timestamp']).groupby(['id_', 'Code']).ffill()

    

             Type_x     Timestamp    V1  Type_y   V2
1 abcd   39-38-30-34  23:46:23.870  44.5     NaN  1.0
2 abcd   39-38-30-34  23:48:07.870  43.5     NaN  1.0
3 abcd   39-38-30-34  23:49:48.870  42.5     NaN  1.0
0 abcd   39-38-30-34  23:46:05.870  35.5     NaN  0.0
4 abcd  39-38-30-34-  23:50:44.870  34.5     NaN  0.0

Or

d2 = d1.sort_values(by = ['id_', 'Code', 'Timestamp']).groupby(['id_', 'Code']).ffill().dropna(1)
print(d2)

 

             Type_x     Timestamp    V1   V2
1 abcd   39-38-30-34  23:46:23.870  44.5  1.0
2 abcd   39-38-30-34  23:48:07.870  43.5  1.0
3 abcd   39-38-30-34  23:49:48.870  42.5  1.0
0 abcd   39-38-30-34  23:46:05.870  35.5  0.0
4 abcd  39-38-30-34-  23:50:44.870  34.5  0.0

**David Erickson** · Accepted Answer · 2020-10-13T04:01:45+00:00

You can use transform and do it column by column if you have other columns in your actual dataframe other than the columns you want to groupby and the columns you want to ffill:

d2 = d1.sort_values(by = ['id_', 'Code', 'Timestamp'])
d2['V1'] = d2.groupby(['id_', 'Code'])['V1'].transform(lambda x: x.ffill())
d2['V2'] = d2.groupby(['id_', 'Code'])['V2'].transform(lambda x: x.ffill())
d2
Out[1]: 
  Type_x          id_                Timestamp    V1  Code  Type_y   V2
1  abcd   39-38-30-34  2012-09-20 23:46:23.870  44.5  0    NaN      1.0
2  abcd   39-38-30-34  2012-09-20 23:48:07.870  43.5  0    NaN      1.0
3  abcd   39-38-30-34  2012-09-20 23:49:48.870  42.5  0    NaN      1.0
0  abcd   39-38-30-34  2012-09-20 23:46:05.870  35.5  2    NaN      0.0
4  abcd   39-38-30-34  2012-09-20 23:50:44.870  34.5  2    NaN      0.0

TechQA.

Groupby and ffill specified columns in Python

There are 2 answers

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in PANDAS-RESAMPLE

Popular Questions

Popular Tags

Trending Questions