I have a set of geographical points (lat, lon) and I want to compute a directed graph where:
- the nodes are those points
- an edge X->Y (between nodes X and Y) exists if the fastest path between X and Y doesn't pass through another node Z.
I am able to compute the duration of the path between any pair of nodes. Right now, I'm doing the following:
- compute the durations between every pair of nodes
- for every pair of nodes X,Y, there is an edge between X and Y if there is no node Z such that the duration of X->Z plus the duration of Z->Y is the same as the duration of X->Y.
I have tested this approach for a subset of the nodes and it seems to work, but since I have around 2000 nodes and the computation of the duration between nodes is computationally expensive (because it involves calculating the shortest path), I would like to know if there is a better approach.
Some additional (probably not relevant) info:
- The nodes are bus stop locations, taken from a GTFS feed
- I'm calculating the shortest paths durations using http://project-osrm.org/
Any help will be greatly appreciated
I would start by running a Delaunay triangulation, this will give you an idea of what the path structure "should" look like (in theory). In theory, once you have the delaunary triangulation, it is easy to find the shortest paths. In practice however, you are using road networks instead of point-to-point distances, so you have to challenge those assumptions.
First build a way that given the Delaunay triangulation. Example of what the triangulation might look like below. Let's say that points HJG are within a city center and the points around it are a freeway that surrounds it. Say with a 65 mph or so speed limit. This means going from ABCG might be faster than AHG.
For every edge in the Delaunay triangulation. Determine the duration of the edge in both directions. This will give you your initial weights (it may also effect the triangulation if you can determine that the shortest path for AH is actually ABH). If there isn't a way to go around for faster routes. The ABCG being faster than AHG example. Then you are probably done at this point. If not you'll have to try to contract edges, which depending on your data, might require doing all routes anyway as you can't really be sure some obscure path allows it to be faster. Sadly, triangulation can't account for differences in speed between edges.
If you can set it up to tell you which point it does go through, then you can avoid having to run it on all paths. IE: If you can say have it tell you the fastest path AF, and it says it uses AHGF, then you can be reasonably sure that AHG and HGF are the shortest paths between AG and HF, and you don't need to run those. However, if you find that AF actually takes ABCEF instead, then there is clearly a great speed difference between the routes, and you should also check the fastest route to AG to ensure it's actually AHG. Whenever you find a faster edge, be sure to add to the triangulation. You may actually be able to add multiple edges. Basically the idea here is that the triangulation gives you an idea of what should be fastest routes if speed doesn't matter. However, you need to confirm or reject that hypothesis. Basically for each node. Find the shortest paths to all points via BFS. Take the longest path and explore it.
To aid in #3, it would likely help to vertex path covering of the edges in the graph. Create paths to all the furthest points from each point until all points are covered. IE: ABCD ABCE AHGF AHJKL AIK (also the reverse order as it is directional). Then you can run shortest path from AD, AE, AF, AL, and AK. If those paths match the expected paths, then you are done. If not, you'll be refining things along the way in the method used for #3 above. Worst case though, I'm not sure there's a way to get this much faster than all shortest paths in theory this approach is roughly O(VE).
In theory #4 requires starting at every vertex. However, vertices can be skipped if the connecting paths have been valid as being the shortest path, so I would likely recommend starting checking vertices with the least number of (un-validated?) edges. Though again, you may need to do all pairs to be 100% sure only the shortest edges connect each node.
For testing I would recommend doing this: Randomly select say 10-200 of those 2000 points. Run the algorithm on those 10-200 points. Then run all paths on those points. If they don't agree take a close look at why they don't agree. Feel free to add any "exceptional" cases to your answer and comment here. Repeat many times and see if there is any disagreement. Note all disagreements. If there are no disagreements, try running on the entire 2000 point set. I would strongly recommend doing this test after step 2. There is a chance that the Delaunay graph is very close to the optimal, depending on your dataset.