Im trying to use the Haversine calc on a Panda Dataframe.
from math import radians, cos, sin, asin, sqrt
def haversine(lon1, lat1, lon2, lat2):
# convert decimal degrees to radians
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
# haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * asin(sqrt(a))
r = 3956
return c * r
This works when using the following code:
haversine(-73.9881286621093,40.7320289611816,-73.9901733398437,40.7566795349121)
However, when I use it against a Pandas DataFrame as such:
train_data['Distance_Travelled'] = train_data.apply(lambda row: haversine(train_data['pickup_longitude'], train_data['pickup_latitude'], train_data['dropoff_longitude'], train_data['dropoff_latitude']), axis=1)
I get the following error.
"cannot convert the series to <class 'float'>"
I've tried numerous ways of casting but each attempt results in the same error. I know that math is expecting float, but I don't understand why the Pandas series can't be cast as a float.
What edit needs to be made for it to work and why?
Don't use
apply
since it is not vectorized. Also, use the vectorized functions from numpy: