I am currently working with a large dataset containing coordinates, and I want to validate the quality of the data.
The df contains coordinates from all over Europe. To valid the quality, I want to calculate the deviation to the nearest road since the data are sent from vehicles.
What I have done so far is downloaded the europe.osm file and cleaned it to contain roads only. The filtered osm file is about 2GB. (used osmfilter)
Next I want to use osmnx to create a graph from the file:
import osmnx
import os
G = osmnx.graph.graph_from_xml('../europe-roads.osm', simplify=True, retain_all=False)
Here starts my first problem. It seems that osmnx can't handle a file by this size. It only works with smaller files (cities).
What I want to do in the end is using the get_nearest_edge() function to calculate the distance to nearest edge.
orig_edge = osmnx.distance.get_nearest_edge(G, (52.393214, 13.133295),return_geom=False, return_dist=True)
My idea now was to drop all nodes in the osm file that would bring it down to about half the size (since I only need the edges). However, I cant create a graph from an osm file with the nodes removes.
Any ideas on how to solve this problem?
In the end, what counts is that I have a solution to measure the distance to the nearest road from coordinates all over Europe.