I have a pandas data frame with multiple columns. I want to create a new column weighted_sum from the values in the row and another column vector dataframe weight
weighted_sum should have the following value:
row[weighted_sum] = row[col0]*weight[0] + row[col1]*weight[1] + row[col2]*weight[2] + ...
I found the function sum(axis=1), but it doesn't let me multiply with weight.
Edit: I changed things a bit.
weight looks like this:
0
col1 0.5
col2 0.3
col3 0.2
df looks like this:
col1 col2 col3
1.0 2.2 3.5
6.1 0.4 1.2
df*weight returns a dataframe full of Nan values.
The problem is that you're multiplying a frame with a frame of a different size with a different row index. Here's the solution:
You can either access the column:
Or use
dotto get back anotherDataFrameTo bring it all together:
Here are the
timeits of each method, using a largerDataFrame.For a wide
DataFrame:So,
dotis faster and more readable.NOTE: If any of your data contain
NaNs then you should not usedotyou should use the multiply-and-sum method.dotcannot handleNaNs since it is just a thin wrapper aroundnumpy.dot()(which doesn't handleNaNs).