While creating a new column in pandas dataframe based on some condition, numpy's where method outperforms the apply method in terms of execution time, why is that so?

For example:

```
df["log2FC"] = df.apply(lambda x: np.log2(x["C2Mean"]/x["C1Mean"]) if x["C1Mean"]> 0 else np.log2(x["C2Mean"]), axis=1)
df["log2FC"] = np.where(df["C1Mean"]==0,
np.log2(df["C2Mean"]),
np.log2(df["C2Mean"]/df["C1Mean"]))
```

This call to

`apply`

is row-wise iteration:`apply`

is just syntactic sugar for looping, you passed`axis=1`

so it's row-wise.Your other snippet

is acting on the entire columns, so it's vectorised.

The other thing is that

`pandas`

is performing more checking, index-alignment, etc.. than`numpy`

.Your calls to

`np.log2`

are meaningless in this context as you pass scalar values:performance-wise it would be the same as calling

`math.log2`

Explaining why numpy is significantly faster or what is vectorisation is beyond the scope of this question. You can see this: What is vectorization?.

The essential thing here is that numpy can and will use external libraries written in C or Fortran which are inherently faster than python.