Optimization problem in connected graphs with profits

329 views Asked by At

I am trying to develop an algorithm to solve a problem that I am not able to classify, I expose the subject:

You have a map divided into sections that have a certain area and where a certain number of people live.

The problem consists of finding sets of connected sections whose area does not exceed a certain value maximizing the number of selected inhabitants.

For now I can think of two approaches:

  • Treat the problem as an all-pairs shortest paths problem in an undirected graph with positive natural values where the solutions that do not meet the constraint of the maximum selected area will be discarded. For this you could use the Floyd-Warshall algorithm, Dijkstra for all pairs or Thorup algorithm (which can be done in time V * E, where these are the vertices and edges of the graph).
  • Treat it as an open vehicle routing problem with profits where each vehicle can start and end wherever it wants (open vehicle routing problem with profits or OVRPP).
  • Another aproach

Also, depending on the combinatorics of the particular problem it is possible in certain cases to use genetic algorithms, together with tabu search, but this is only for cases where finding an optimal solution is inadmissible.

To be clearer, what is sought is to obtain a selection of connected sections whose sum of areas does not exceed a total area. The parameter to maximize is the sum of populations of the selected sections. The objective is to find an optimal solution.

For example, this is the optimal selection with max area of 6 (red color area)

enter image description here

Thank you all in advance!

4

There are 4 answers

4
ldog On BEST ANSWER

Originally was going to leave this as a comment, as it doesn't constitute a complete answer with coded solution, but I believe it may approximate the canonical one.

Your problem has been studied extensively (perhaps not this particular variation, but many others including this one.) It is a variant of the knapsack problem and is NP-complete or harder (NP-hard). This should be obvious since a special case of your problem is when all nodes are connected to all other nodes, which is precisely the 0-1 knapsack problem.

In reference (1), the authors show an approximation algorithm that approximates with lower and upper bounds of

(1−ε)/2 ·(1 − 1/e^(1−ε))

and

1 − 1/e + ε

respectively, which is probably as close as you will get in terms of a good approximation algorithm.

There is a simple variation of the dynamic programming solution to the 0-1 knapsack problem that would yield a good approximation, which I can discuss some more if there is interest and time.

(1) Borradaile, Glencora, Brent Heeringa, and Gordon Wilfong. "The knapsack problem with neighbour constraints." Journal of Discrete Algorithms 16 (2012): 224-235.

6
D.W. On

One pragmatic approach would be to formulate this as an instance of integer linear programming, and use an off-the-shelf ILP solver. One way to formulate this as an ILP instance is build a graph with one vertex per section and an edge between each pair of adjacent sections; then, you want to select a connected component in that graph.

So, let x_{v,d} be a set of zero-or-one variables, one for each vertex v and for each d=0,1,..,n-1, where n is an upper bound on the number of vertices that might be selected (e.g., the total number of vertices in the graph). Add constraints to enforce that there is exactly one vertex, call it r, for which x_{r,0}=1. The intended meaning is that at each step we grow the size of the selected connected component, by adding vertices that are adjacent to ones that were already selected. After step 0, we have selected r. If x_{v,d}=1, then v is part of the connected component selected after d steps (and thus there is guaranteed to be a path of length <=d from r to v). It follows that the connected component we ultimately end up with, after all the steps complete, contains the vertices v where x_{v,n-1}=1.

You can enforce this meaning by adding the linear inequalities x_{v,d} >= x_{v,d-1}, and x_{v,d} <= x_{v,d-1} + sum_u x_{u,d-1} where the sum is over all u that are adjacent to v, and sum_v x_{v,0} = 1.

Finally, you have a constraint that the total area does not exceed the maximum: sum_v A_v x_{v,n-1} <= maxarea, where A_v is the area of section v.

Then your goal is to maximize sum_v P_v x_{v,n-1}, where P_v is the population of section v. The solution to this integer linear programming problem will give the optimal solution to your problem.

2
ravenspoint On
 - Construct empty list of combined sections
 - LOOP s1 over sections
     - set found false
     - LOOP s2 over sections
         - IF s1 = s2 
             - CONTINUE 
         - IF s1,s2 not connected
             - CONTINUE
         - IF s1,s2 combined population > maximum
             - CONTINUE
         - ADD s1,s2 to list
         - set found true
         - break out of loop s2
     - IF found
         - CONTINUE loop s1

 - SET found = true
 - WHILE ( found )
      found = false
      - LOOP C over combination in list
          - LOOP U over uncombined sections
             - IF not connected
                 - CONTINUE
             - IF combined population > maximum
                 - CONTINUE
     - REPLACE C in list with C + U
     - found = true

- Calculate value of list ( sum of populations in sections in list )

Now you have an initial feasible solution and can generate more

- WHILE( reasonable time limit not reached )
     - random shuffle the order of sections used by the loops
     - run the the above algorithm to get a different feasible solution
     - IF new solution has a greater value than previous, replace

Here is the code to find a reasonable feasible solution

void combine()
{
    raven::set::cRunWatch aWatcher("combine sections");

    // Finds pairs of sections that can be combined
    for (auto &a1 : vSection)
    {
        // check for already combined
        if (a1.myfCombined)
            continue;
        
        // search for possible section to combine with a1
        for (auto &a2 : vSection)
        {
            // is it uncombined
            if (a2.myfCombined)
                continue;

            // is it itself
            if (a1.myName == a2.myName)
                continue;

            // is the the combined area under the limit
            if (a1.myArea + a2.myArea > maxArea)
                continue;

            // is it physically connected
            if (!a1.isConnected(a2))
                continue;

            // OK to combine
            cCombined comb;
            comb.add(a1);
            comb.add(a2);
            theCombined.push_back(comb);
            break;
        }
    }

    // Try adding uncombined sections to the combinations already found

    bool fimproved = true;
    while (fimproved)
    {
        // loop over the combinations until no more new combinations found
        fimproved = false;
        for (auto &C : theCombined)
        {
            // loop over uncombined seaction for possible addition to combimation
            for (auto &U : vSection)
            {
                // it it uncombined
                if (U.myfCombined)
                    continue;

                // is the the combined area under the limit
                if (C.myArea + U.myArea > maxArea)
                    continue;

                // is it physically connected
                if (!C.IsConnected(U))
                    continue;

                // OK, add to combination
                C.add(U);
                fimproved = true;
            }
        }
        // check we are still finding additional combinations
        if (!fimproved)
            break;
    }
}

The complete application code is at https://github.com/JamesBremner/so75423308

Here is the result of a unit test on a 10 section randomly generated problem

sections
S0 area 2 pop 68 connected 9 4 8 2
S1 area 1 pop 6 connected 7 5 2 6 4
S2 area 1 pop 54 connected 1 6
S3 area 1 pop 96 connected 6 1 8 9 2 7 5
S4 area 1 pop 4 connected 2
S5 area 2 pop 74 connected 1 3 8 7
S6 area 1 pop 63 connected 7 9 3 1 8 5 0
S7 area 1 pop 89 connected 0 2 4 8 6 5
S8 area 1 pop 30 connected
S9 area 1 pop 7 connected 3

combinations
S0 S2 S8 S9  | area 5 pop 159
S1 S4  | area 2 pop 10
S3 S5  | area 3 pop 170
S6 S7  | area 2 pop 152

Here is the result of a timing test on 2,000 node randomly generated problem

raven::set::cRunWatch code timing profile
Calls           Mean (secs)     Total           Scope
       1        0.0299339       0.0299339       combine sections

Conclusion: The code generates a feasible solution for your worst case problem in 30 milliseconds, so, for your performance requirement of 2 minutes, 4,000 possible solutions can be checked. This may not be enough to guarantee the optimal will be found but you should get quite close. For smaller problems the optimal is guaranteed.

Here is a timing profile where 4,000 combinations are tried and the best result is stored

Try 18 best value now 100542
Try 101 best value now 100552
Try 183 best value now 100577
Try 308 best value now 100582
raven::set::cRunWatch code timing profile
Calls           Mean (secs)     Total           Scope
    4000        0.0328638       131.455 try
    4001        0.0327868       131.18  combine sections

Note that because of the strict limits ( area and physical connection ) on which sections are candidates to be combined, there are a lot less than 4,000 feasible combinations. Many different initial arrangements result in the same combination generated.

If you have an actual, real world problem please post it and I will run my code on it for you.

4
Olivier On

With thousands of sections, the problem has a high combinatorial complexity so you will probably need a heuristic. I propose this one. The idea is to start with a single section and expand it iteratively by selecting the neighbour with the highest ratio population / area. More precisely:

  1. Take an acceptable section s (i.e. with area(s) <= maxArea) and put it into a set : S = {s}.

  2. Consider all neighbour sections of S. Sort them by the ratio population / area in decreasing order. Find the first acceptable t element from that list (i.e. such that area(S) + area(t) <= maxArea).

  3. If an element t was found, add it to S and go back to step 2. If no element was found, you now have a candidate solution S.

  4. Take the next acceptable section and go back to step 1.

When it's done, you have a number of candidate solutions. Select the one with the highest population.

It's possible to improve this algorithm with a stochastic variant: instead of always taking the first acceptable element in the list, draw a random number to decide which one you select (e.g. 90% probability for the first one and 10% for the second). For every initial section, repeat the loop (steps 2 and 3) a number of times and see if it beats the determinisitic solution.