Extract location information from latitude and longitude columns present in a cuDF dataframe?

68 views Asked by At

I have a dataset in Pandas format, which I have converted to cuDF for faster processing of location extraction from the latitude and longitude columns present in the dataset. The CPU code that I have is as follows:

import pandas as pd
from geopy.geocoders import Nominatim
from math import radians, sin, cos, sqrt, atan2
import time

cab_df = filtered_cab_data.copy()

geolocator = Nominatim(user_agent="reverse_geocoding_example")

def haversine_distance(lat1, lon1, lat2, lon2):
    \# Convert latitude and longitude from degrees to radians
    lat1, lon1, lat2, lon2 = map(radians, \[lat1, lon1, lat2, lon2\])

    # Radius of the Earth in kilometers
    R = 6371

    # Haversine formula
    dlat = lat2 - lat1
    dlon = lon2 - lon1
    a = sin(dlat / 2)**2 + cos(lat1) * cos(lat2) * sin(dlon / 2)**2
    c = 2 * atan2(sqrt(a), sqrt(1 - a))
    distance = R * c

    return distance

 def reverse_geocode(row):
    lat = row\['pickup_latitude'\]
    lon = row\['pickup_longitude'\]
    retries = 3  # Number of retries
    for \_ in range(retries):
    try:
    location = geolocator.reverse((lat, lon))
    return location.address if location else None
    except Exception as e:
    print("Error:", e)
    time.sleep(2)  # Wait for a while before retrying
    return None

    cab_df\['pickup_location'\] = cab_df.apply(reverse_geocode, axis=1)

 def reverse_geocode_dropoff(row):
    lat = row\['dropoff_latitude'\]
    lon = row\['dropoff_longitude'\]
    retries = 3  # Number of retries
    for \_ in range(retries):
    try:
    location = geolocator.reverse((lat, lon))
    return location.address if location else None
    except Exception as e:
    print("Error:", e)
    time.sleep(2)  # Wait for a while before retrying
    return None

 cab_df\['dropoff_location'\] = cab_df.apply(reverse_geocode_dropoff, axis=1)

 cab_df.head()'''

How can I modify this code to run on a GPU using a cuDF dataframe? I have attempted a few modifications with cuSpatial and cuProj, but all of them have resulted in TypeError errors.

I have explored various libraries and APIs (such as geopy, geopandas, nominatim etc.) used in Pandas for extracting location information from latitude and longitude data but with no success. Most are showing TypeError.

1

There are 1 answers

0
jarmak-nv On

A couple thoughts for you:

  • cuDF .apply() usage needs to be able to compile to the GPU, I'd recommend checking out this guide first when using the .apply() function
  • cuSpatial has a notebook that does possibly exactly what you're looking for: reverse geocoding on the GPU. Try checking it out and comparing to the errors you're seeing