I have a sample dataframe that looks like this:
data = {"ID": [1,2,3],
"A": ["", "", ""],
"B": [2,3,1],
"C": [1,2,0],
"var_i3": [0,0,0],
"var_i4": [0,0,0],
"var_i5": [0,0,0],
"var_i6": [0,0,0]
}
df = pd.DataFrame(data)
df
And would like to assign specific values to the different "var_i#" columns based on specific conditions.
Example:
if A is null and B in equal to 2 or 3 then "var_i3" should be 0 and "var_i4" = 1 and "var_i5" = 0.
I have tried the following:
def process(row):
if pd.isnull(row["A"]) and row["B"] in [2,3]:
return row["var_i3"] == 0 & row["var_i4"] == 1 & row["var_i5"] == 0
elif row["C"] == 0:
return row["var_i6"] == 13
else:
if row["C"] >= 1:
return row["var_i6"] == 0
return row
df = df.apply(process, axis=1)
df
I'm not sure how the syntax works for multiple conditions as an output.
I also tried to use np.where:
def process(row):
np.where(df["A"].isnull & df["B"] in [2,3], df["var_i3"] == 0 & df["var_i4"] == 1 & df["var_i5"] == 0,
np.where(df["C"] == 0, df["var_i6"] == 13, np.where(df["C"] == 1, df["var_i6"] == 0, row)))
df = df.apply(process)
df
Could you provide any feedback on what is wrong in my code?
You can do this directly with an assignment. It looks something like this:
Similarly for any of the other columns. You can even save the condition in a variable to reuse it:
Warning: This code is untested, but should illustrate the general principle. I recommend that you read more about "broadcasting" in pandas to understand this better.