I have 8 traveling consultants that need to visit 155 groups across the continental united states. Is there a way to find the optimal 8 regions based of drive time using k-means clustering? I see there are some methods implemented already for other data sets, but they are not based off drive time. How will I need to manipulate my data set to make it usable?
Thank you in advance for any feedback. I am by no means a great coder, I have taken only a few introductory courses back in college.
I think you are looking for "path planning" rather than clustering. The traveling salesman problem comes to mind
If you want to use clustering to find the individual regions you should find the coordinates for each location with respect to some global frame. One example would be using Latitude and longitude coordinates. Create an array X thats
155x2
where each row is a destination with the columnslat,long
Then simply run matlab's kmeans something likeshould work nicely. This should be enough to get you started.
One issue with this approach is that it will group the sites by geographical location. Which isn't always the same as shortest travel time. For instance,
but getting from
A-B
requires going around a river and actual travel distance is 10 miles, whereasA-C
is realistically 2.5 miles, clearlyA-C
is the better choice, but using global position alone wouldn't take this into account