How to group by trip id and find the straight distance traveled?

141 views Asked by At

I have the following data :

Trip      Start_Lat   Start_Long    End_lat      End_Long    Starting_point    Ending_point
Trip_1    56.5624     -85.56845       58.568       45.568         A               B
Trip_1    58.568       45.568       -200.568     -290.568         B               C 
Trip_1   -200.568     -290.568       56.5624     -85.56845        C               D
Trip_2    56.5624     -85.56845     -85.56845    -200.568         A               B
Trip_2   -85.56845    -200.568      -150.568     -190.568         B               C

I would like to find the circuitry which is

   Circuity = Total Distance Travelled(Trip A+B+C+D) - Straight line (Trip A to D)
              -----------------------------------------------------------------------
                       Total Distance Traveled (Trip A+B+C+D)

I tried the following code,

    df['Distance']= df['flight_distance'] = df.apply(lambda x: great_circle((x['start_lat'], x['start_long']), (x['end_lat'], x['end_long'])).km, axis = 1) 
    df['Total_Distance'] = ((df.groupby('Trip')['distance'].shift(2) +['distance'].shift(1) + df['distance']).abs())

Could you help me to find the straight line distance and circuitry?

1

There are 1 answers

14
MaxU - stand with Ukraine On BEST ANSWER

UPDATE:

you may want to convert your values to numeric dtypes first:

df[['Start_Lat','Start_Long','End_lat','End_Long']] = \
df[['Start_Lat','Start_Long','End_lat','End_Long']].apply(pd.to_numeric, errors='coerce')

IIUC you can do it this way:

# vectorized haversine function
def haversine(lat1, lon1, lat2, lon2, to_radians=True, earth_radius=6371):
    """
    slightly modified version: of http://stackoverflow.com/a/29546836/2901002

    Calculate the great circle distance between two points
    on the earth (specified in decimal degrees or in radians)

    All (lat, lon) coordinates must have numeric dtypes and be of equal length.

    """
    if to_radians:
        lat1, lon1, lat2, lon2 = np.radians([lat1, lon1, lat2, lon2])

    a = np.sin((lat2-lat1)/2.0)**2 + \
        np.cos(lat1) * np.cos(lat2) * np.sin((lon2-lon1)/2.0)**2

    return earth_radius * 2 * np.arcsin(np.sqrt(a))

def f(df):
    return 1 - haversine(df.iloc[0, 1], df.iloc[0, 2],
                         df.iloc[-1, 3], df.iloc[-1, 4]) \
               / \
               haversine(df['Start_Lat'], df['Start_Long'],
                         df['End_lat'], df['End_Long']).sum()

df.groupby('Trip').apply(f)

Result:

In [120]: df.groupby('Trip').apply(f)
Out[120]:
Trip
Trip_1    1.000000
Trip_2    0.499825
dtype: float64