I'm curious why this toy example for simultaneous boolean index + assignment in Pandas doesn't work:
df = pd.DataFrame({'Source': ['A', 'B', 'C', 'A', 'B', 'C'],
'Period': ['1 hr', '1 hr', '1 hr', '24 hr', '24 hr', '24 hr'],
'CO': [1.1, 1.2, 1.3, 2.1, 2.2, 2.3],
'DPM': [11.1, 11.2, 11.3, 12.1, 12.2, 12.3],
'NOx': [21.1, 21.2, 21.3, 22.1, 22.2, 22.3]})
The resulting toy DataFrame is here:
Source Period CO DPM NOx
0 A 1 hr 1.1 11.1 21.1
1 B 1 hr 1.2 11.2 21.2
2 C 1 hr 1.3 11.3 21.3
3 A 24 hr 2.1 12.1 22.1
4 B 24 hr 2.2 12.2 22.2
5 C 24 hr 2.3 12.3 22.3
Now, I want the final DataFrame to take the 24 hr
values and assign it to the 1 hr
values for sources A and B. The final DataFrame should look like this:
Source Period CO DPM NOx
0 A 1 hr 2.1 12.1 22.1
1 B 1 hr 2.2 12.2 22.2
2 C 1 hr 1.3 11.3 21.3
3 A 24 hr 2.1 12.1 22.1
4 B 24 hr 2.2 12.2 22.2
5 C 24 hr 2.3 12.3 22.3
I tried to do the following command:
df.loc[df['Source'].isin(['A', 'B']) & (df['Period'] == '1 hr'), ['CO', 'DPM', 'NOx']] =\
df.loc[df['Source'].isin(['A', 'B']) & (df['Period'] == '24 hr'), ['CO', 'DPM', 'NOx']]
but at the end my DataFrame is replaced with NaNs:
Source Period CO DPM NOx
0 A 1 hr NaN NaN NaN
1 B 1 hr NaN NaN NaN
2 C 1 hr 1.3 11.3 21.3
3 A 24 hr 2.1 12.1 22.1
4 B 24 hr 2.2 12.2 22.2
5 C 24 hr 2.3 12.3 22.3
The filter expressions on both the LHS and RHS of the assignment are filtering correctly with the same number of rows, it seems that the assignment is where it's thrown off. How do I do this correctly? Note that I only want the CO, DPM, and NOx values to change, not any other columns.
The problem is that the indexes don't match. You can get around that issue by using the underlying numpy array:
Output:
Note that this only works as you expect if there is a one-to-one relation between "1 hr" and "24 hr" for each "Source" type.
You could also use
groupby
+last
: