I have a pandas data frame with multiple columns. I want to create a new column weighted_sum
from the values in the row and another column vector dataframe weight
weighted_sum
should have the following value:
row[weighted_sum] = row[col0]*weight[0] + row[col1]*weight[1] + row[col2]*weight[2] + ...
I found the function sum(axis=1)
, but it doesn't let me multiply with weight
.
Edit: I changed things a bit.
weight
looks like this:
0
col1 0.5
col2 0.3
col3 0.2
df
looks like this:
col1 col2 col3
1.0 2.2 3.5
6.1 0.4 1.2
df*weight
returns a dataframe full of Nan
values.
The problem is that you're multiplying a frame with a frame of a different size with a different row index. Here's the solution:
You can either access the column:
Or use
dot
to get back anotherDataFrame
To bring it all together:
Here are the
timeit
s of each method, using a largerDataFrame
.For a wide
DataFrame
:So,
dot
is faster and more readable.NOTE: If any of your data contain
NaN
s then you should not usedot
you should use the multiply-and-sum method.dot
cannot handleNaN
s since it is just a thin wrapper aroundnumpy.dot()
(which doesn't handleNaN
s).