# Why numpy's where operation is faster than apply function?

While creating a new column in pandas dataframe based on some condition, numpy's where method outperforms the apply method in terms of execution time, why is that so?

For example:

``````df["log2FC"] = df.apply(lambda x: np.log2(x["C2Mean"]/x["C1Mean"]) if x["C1Mean"]> 0 else np.log2(x["C2Mean"]), axis=1)

df["log2FC"] = np.where(df["C1Mean"]==0,
np.log2(df["C2Mean"]),
np.log2(df["C2Mean"]/df["C1Mean"]))
``````

On Best Solutions

This call to `apply` is row-wise iteration:

``````df["log2FC"] = df.apply(lambda x: np.log2(x["C2Mean"]/x["C1Mean"]) if x["C1Mean"]> 0 else np.log2(x["C2Mean"]), axis=1)
``````

`apply` is just syntactic sugar for looping, you passed `axis=1` so it's row-wise.

``````df["log2FC"] = np.where(df["C1Mean"]==0,
np.log2(df["C2Mean"]),
np.log2(df["C2Mean"]/df["C1Mean"]))
``````

is acting on the entire columns, so it's vectorised.

The other thing is that `pandas` is performing more checking, index-alignment, etc.. than `numpy`.

Your calls to `np.log2` are meaningless in this context as you pass scalar values:

`````` np.log2(x["C2Mean"]/x["C1Mean"])
``````

performance-wise it would be the same as calling `math.log2`

Explaining why numpy is significantly faster or what is vectorisation is beyond the scope of this question. You can see this: What is vectorization?.

The essential thing here is that numpy can and will use external libraries written in C or Fortran which are inherently faster than python.