Pandas data frame use interpolate() partitioning with specific columns

Question

Pandas data frame use interpolate() partitioning with specific columns

51 views Asked by Nicolaesse At 09 January 2024 at 16:06

I have the following Pandas data frame (called df).

+--------+--------+------+--------+
| Person | Animal | Year | Number |
+--------+--------+------+--------+
| John   | Dogs   | 2000 | 2      |
| John   | Dogs   | 2001 | 2      |
| John   | Dogs   | 2002 | 2      |
| John   | Dogs   | 2003 | 2      |
| John   | Dogs   | 2004 | 2      |
| John   | Dogs   | 2005 | 2      |
| John   | Cats   | 2000 | 1      |
| John   | Cats   | 2001 | NaN    |
| John   | Cats   | 2002 | NaN    |
| John   | Cats   | 2003 | 4      |
| John   | Cats   | 2004 | 5      |
| John   | Cats   | 2005 | 6      |
| Peter  | Dogs   | 2000 | NaN    |
| Peter  | Dogs   | 2001 | 1      |
| Peter  | Dogs   | 2002 | NaN    |
| Peter  | Dogs   | 2003 | 5      |
| Peter  | Dogs   | 2004 | 5      |
| Peter  | Dogs   | 2005 | 5      |
| Peter  | Cats   | 2000 | NaN    |
| Peter  | Cats   | 2001 | 4      |
| Peter  | Cats   | 2002 | 4      |
| Peter  | Cats   | 2003 | 4      |
| Peter  | Cats   | 2004 | 4      |
| Peter  | Cats   | 2005 | 4      |
+--------+--------+------+--------+

My target is to get the following, which means using the interpolate method to fill the NaN values, but based on the other column value. In other words, it should

partition the df using the Person and Animal columns
order by Year (asc)
apply the interpolate method

.

+--------+--------+------+--------+
| Person | Animal | Year | Number |
+--------+--------+------+--------+
| John   | Dogs   | 2000 | 2      |
| John   | Dogs   | 2001 | 2      |
| John   | Dogs   | 2002 | 2      |
| John   | Dogs   | 2003 | 2      |
| John   | Dogs   | 2004 | 2      |
| John   | Dogs   | 2005 | 2      |
| John   | Cats   | 2000 | 1      |
| John   | Cats   | 2001 | 2      |
| John   | Cats   | 2002 | 3      |
| John   | Cats   | 2003 | 4      |
| John   | Cats   | 2004 | 5      |
| John   | Cats   | 2005 | 6      |
| Peter  | Dogs   | 2000 | NaN    |
| Peter  | Dogs   | 2001 | 1      |
| Peter  | Dogs   | 2002 | 3      |
| Peter  | Dogs   | 2003 | 5      |
| Peter  | Dogs   | 2004 | 5      |
| Peter  | Dogs   | 2005 | 5      |
| Peter  | Cats   | 2000 | NaN    |
| Peter  | Cats   | 2001 | 4      |
| Peter  | Cats   | 2002 | 4      |
| Peter  | Cats   | 2003 | 4      |
| Peter  | Cats   | 2004 | 4      |
| Peter  | Cats   | 2005 | 4      |
+--------+--------+------+--------+

What I have done

I can filter for each Person and each Animal and then apply the interpolate methods. Finally, merge all together, but this sounds dull and long if you have many columns.

Original Q&A

There are 2 answers

Laurent B. On 09 January 2024 at 20:19

Proposed script

Another readable solution :

def interpolate_values(group):
    group = group.sort_values('Year')
    group['Number'] = group['Number'].interpolate()
    return group

df['Number'] = (df.groupby(['Person', 'Animal'], group_keys=False)
                  .apply(interpolate_values)['Number'])

   Person Animal  Year  Number
0    John   Dogs  2000     2.0
1    John   Dogs  2001     2.0
2    John   Dogs  2002     2.0
3    John   Dogs  2003     2.0
4    John   Dogs  2004     2.0
5    John   Dogs  2005     2.0
6    John   Cats  2000     1.0
7    John   Cats  2001     2.0
8    John   Cats  2002     3.0
9    John   Cats  2003     4.0
10   John   Cats  2004     5.0
11   John   Cats  2005     6.0
12  Peter   Dogs  2000     NaN
13  Peter   Dogs  2001     1.0
14  Peter   Dogs  2002     3.0
15  Peter   Dogs  2003     5.0
16  Peter   Dogs  2004     5.0
17  Peter   Dogs  2005     5.0
18  Peter   Cats  2000     NaN
19  Peter   Cats  2001     4.0
20  Peter   Cats  2002     4.0
21  Peter   Cats  2003     4.0
22  Peter   Cats  2004     4.0
23  Peter   Cats  2005     4.0

**Corralien** · Accepted Answer · 2024-01-09T16:13:52+00:00

You can try:

df['Number'] = (df.sort_values('Year', ascending=True)
                  .groupby(['Person', 'Animal'])['Number']
                  .transform(lambda x: x.interpolate()))
print(df)

# Output
   Person Animal  Year  Number
0    John   Dogs  2000     2.0
1    John   Dogs  2001     2.0
2    John   Dogs  2002     2.0
3    John   Dogs  2003     2.0
4    John   Dogs  2004     2.0
5    John   Dogs  2005     2.0
6    John   Cats  2000     1.0
7    John   Cats  2001     2.0  # interpolate
8    John   Cats  2002     3.0  # interpolate
9    John   Cats  2003     4.0
10   John   Cats  2004     5.0
11   John   Cats  2005     6.0
12  Peter   Dogs  2000     NaN
13  Peter   Dogs  2001     1.0
14  Peter   Dogs  2002     3.0
15  Peter   Dogs  2003     5.0
16  Peter   Dogs  2004     5.0
17  Peter   Dogs  2005     5.0
18  Peter   Cats  2000     NaN
19  Peter   Cats  2001     4.0
20  Peter   Cats  2002     4.0
21  Peter   Cats  2003     4.0
22  Peter   Cats  2004     4.0
23  Peter   Cats  2005     4.0

For multiple columns, just use the same operation:

cols = ['Number']  # list of columns
df[cols] = (df.sort_values('Year', ascending=True)
              .groupby(['Person', 'Animal'])[cols]
              .transform(lambda x: x.interpolate()))

TechQA.

Pandas data frame use interpolate() partitioning with specific columns

There are 2 answers

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in DATAFRAME

Related Questions in INTERPOLATION

Related Questions in FILLNA

Popular Questions

Trending Questions