Pandas Boolean Filter with Assignment resulting in NaN

Question

Pandas Boolean Filter with Assignment resulting in NaN

244 views Asked by Arthur Lin At 07 March 2022 at 21:07

I'm curious why this toy example for simultaneous boolean index + assignment in Pandas doesn't work:

df = pd.DataFrame({'Source': ['A', 'B', 'C', 'A', 'B', 'C'],
                   'Period': ['1 hr', '1 hr', '1 hr', '24 hr', '24 hr', '24 hr'],
                   'CO': [1.1, 1.2, 1.3, 2.1, 2.2, 2.3],
                   'DPM': [11.1, 11.2, 11.3, 12.1, 12.2, 12.3],
                   'NOx': [21.1, 21.2, 21.3, 22.1, 22.2, 22.3]})

The resulting toy DataFrame is here:

  Source Period   CO   DPM   NOx
0      A   1 hr  1.1  11.1  21.1
1      B   1 hr  1.2  11.2  21.2
2      C   1 hr  1.3  11.3  21.3
3      A  24 hr  2.1  12.1  22.1
4      B  24 hr  2.2  12.2  22.2
5      C  24 hr  2.3  12.3  22.3

Now, I want the final DataFrame to take the 24 hr values and assign it to the 1 hr values for sources A and B. The final DataFrame should look like this:

  Source Period   CO   DPM   NOx
0      A   1 hr  2.1  12.1  22.1
1      B   1 hr  2.2  12.2  22.2
2      C   1 hr  1.3  11.3  21.3
3      A  24 hr  2.1  12.1  22.1
4      B  24 hr  2.2  12.2  22.2
5      C  24 hr  2.3  12.3  22.3

I tried to do the following command:

df.loc[df['Source'].isin(['A', 'B']) & (df['Period'] == '1 hr'), ['CO', 'DPM', 'NOx']] =\ 
 df.loc[df['Source'].isin(['A', 'B']) & (df['Period'] == '24 hr'), ['CO', 'DPM', 'NOx']]

but at the end my DataFrame is replaced with NaNs:

  Source Period   CO   DPM   NOx
0      A   1 hr  NaN   NaN   NaN
1      B   1 hr  NaN   NaN   NaN
2      C   1 hr  1.3  11.3  21.3
3      A  24 hr  2.1  12.1  22.1
4      B  24 hr  2.2  12.2  22.2
5      C  24 hr  2.3  12.3  22.3

The filter expressions on both the LHS and RHS of the assignment are filtering correctly with the same number of rows, it seems that the assignment is where it's thrown off. How do I do this correctly? Note that I only want the CO, DPM, and NOx values to change, not any other columns.

Original Q&A

There are 1 answers

**AudioBubble** · Accepted Answer · 2022-03-07T21:19:26+00:00

The problem is that the indexes don't match. You can get around that issue by using the underlying numpy array:

msk = (df['Period'] == '24 hr')
cols = ['DPM', 'NOx']
df.loc[~msk & df['Source'].isin(['A','B']), cols] = df.loc[msk & df['Source'].isin(['A','B']), cols].to_numpy()

Output:

  Source Period   CO   DPM   NOx
0      A   1 hr  1.1  12.1  22.1
1      B   1 hr  1.2  12.2  22.2
2      C   1 hr  1.3  11.3  21.3
3      A  24 hr  2.1  12.1  22.1
4      B  24 hr  2.2  12.2  22.2
5      C  24 hr  2.3  12.3  22.3

Note that this only works as you expect if there is a one-to-one relation between "1 hr" and "24 hr" for each "Source" type.

You could also use groupby + last:

cols = ['DPM', 'NOx']
filt = df['Source'].isin(['A','B'])
df.loc[filt, cols] = df[filt].groupby('Source')[cols].transform('last')

TechQA.

Pandas Boolean Filter with Assignment resulting in NaN

There are 1 answers

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in DATAFRAME

Related Questions in BOOLEAN-INDEXING

Popular Questions

Popular Tags

Trending Questions