I have 2 data frames with routes and linestrings:
df1 = {
"Route": ["AL013-AL015", "AL013-AL014", "AL013-AL011"],
"Linestring": ["LINESTRING (20.40350 42.06510, 19.70210 42.16300)", "LINESTRING (20.40350 42.06510, 19.84780 41.78380)", "LINESTRING (20.40350 42.06510, 20.25610 41.60390)"],
}
df2 = {
"Route": ["NO0A3-NO071", "NO0A3-NO091", "NO0A3-NO0A3"],
"Linestring": ["LINESTRING (8.53910 62.52120, 14.78250 66.69440)", "LINESTRING (8.53910 62.52120, 8.70540 59.49660)", "LINESTRING (8.53910 62.52120, 8.53910 62.52120)"],
}
The problem is that they are large (df1 has approx 2 million rows and df2 has 300k rows). I want to use geopandas.sjoin_nearest like this:
df_new = gpd.sjoin_nearest(df2, df1, how = 'left')
Nevertheless, the computational time is very long. Are there any ways to speed it up? I was googling it and found spatial indexing in geopandas. But I am not sure if it works with linestrings. It would be nice if someone could explain to me how to apply spatial indexing to linestrings or any other ways to speed up the join.