sample_data = {
'Class': ['TEST', 'NOT TEST', 'TEST 123', 'SOMETHING ELSE', 'TEST'],
'Values': ['1.3 / 4.5', '1.5', '0.7 / 5.2', '123', '1.5 / 4.0']
}
df = pd.DataFrame(sample_data)
mask = df['Class'].str.startswith('TEST')
# I WANT TO DO SOMETHING LIKE THIS -->
df.loc[mask, ['COL1', 'COL2']] = df.loc[mask, 'Values'].str.split(' / ', expand=True)
df.head()
I am getting Nan for COL1 and COL2 for all rows. My expectation would be that I would get the split values for only the rows matching the mask and Nan for others.
I have read the pandas documentation (not enough apparently) and have been working with AI, but it is sending me in circles. Clearly I am lost in the weeds and not understanding something that I think should be very simple. Maybe something with copies and such.
I would expect, and what I want, is that I would get values in COL1 and COL2 for just rows which start with 'TEST'. Other rows would be NaN for COL1 and COL2.
I understand why the below code doesn't work (am working on a copy), but I don't understand why the original snipet doesn't work:
df_subset = df[mask]
df_subset[['COL1', 'COL2']] = df_subset['Values'].str.split(' / ', expand=True)
df.head() #NOTHING CHANGED AS EXPECTED BUT df_subset is as expected
Since the column names are not aligned with the output of
split(columns are 0/1), you should use the underlying numpy array:Output:
Output of
df.loc[mask, 'Values'].str.split(' / ', expand=True):