Algorithm for Colored Clique of Size K

276 views Asked by At

I abstracted a real-world problem to a graph-theory-problem. Currently, I try to solve this problem by making use of a clique-algorithm from networkx library. I need ideas how my algorithm can be adapted to improve run-time performance.

Problem:

Let G be a undirected, colored graph and let R be a mapping, where each color is mapped to a given quantity, e.g.

R = {"red": 3, "blue": 1, "green": 1, "magenta": 1, "orange": 1}

I need to find a set of nodes V, with following properties (or determine that no such set exists):

  1. For each (color,amount) in R, we need at least the amount of colored nodes in V.

    (with above R for example, we need at least three red nodes, one blue node, etc. in V)

  2. The nodes in V are not allowed to be neighbors in G.

Simple Example:

With this colored graph and the above given requirement R, one possible solution is: V = {0, 1, 2, 8, 13, 15, 16}, because all the color-amounts required by R are contained in V and no node is adjacent to the other.

enter image description here

Current Approach:

  • Create complement graph CG from G
  • Iteratively find cliques and check, wether the amount of colored nodes in clique is as required by R.
import networkx as nx

def find_result(G, R):
    CG = nx.complement(G)
    for V in nx.find_cliques(CG):
        color_count = get_color_count(G, V)
        if all([(color_count.get(color, 0) >= R[color]) for color in R.keys()]):
            return V
        
# Helper method to count colors
def get_color_count(G, V):
    colors = [G.nodes[v]["color"] for v in V]
    count = dict()
    for color in colors:
        count[color] = count.get(color, 0) + 1
    return count

Performance Issues:

With this trivial approach, I run into performance problems (i.e. the algorithm takes way too long for particular configurations of R). My graphs can get as worse as seen below, with 282 nodes, 4640 edges and 7 different colors.

So here is my question: how can the networkx-clique-algorithm be customized to make use of color information, clique-size, etc.? I'm glad for any ideas/hints regarding improvements or new approaches.

enter image description here

EDIT 1

Here is the graph-content (sparse6-format) from the big graph above. The content can be pasted into a file and then read with networkx.

>>sparse6<<:~?CY_A??@_??OA_??OA?M??@?G?oC_??OA?K@?D_??OA?K@?D?Y??@?G?oC?S@_F_??OA?K@?D?W@oG_??OA?K@?D?W@oG?e??@?G?oC?S@_F?_AOI`CCGO@ED?R`WBOT`_B_V`gEW[@mEo[`qF?]`}DWV`eGwYAKHG[@wFw\AYI_??C?_B?O@OE?[A?H?gAwk???OA?K@?D?W@oG?cA_J_??OA?K@?D?W@oG?cA_Ja{FOf_??OA?K@?D?W@oG?cA_J_??OA?K@?D?W@oG?cA_JbO??@?G?oC?S@_F?_AOI?mM?__??OA?K@?D?W@oG?cA_JbkBw|AQB_Wc?Nw~CANp?CE??@?G?oC?UPOK?sLwK?sLpD_oBOvB{OPDCYB?LB[O@ACSP`F_oBPDCWPpG_oBPDCWPpGCeB?LB{OPDCWPpGCcQgK?sO@ACSP`FC_QPICmB?LB{O@DCWPpGCcQ`JCqB?LB{O@DCWPpGCcQ`JCoRWK?sNp?CCPPEC[Q@HCgQpKCsRgK?sNp?CGPPEC[Q@HCgQpKCsR`N_oBP@CGPPEC[Q@HCgQpKCsR`NDAB?LCCO`DCWPpGCcQ`JCoRPMC{S@P_oBO~CCO`DCWPpGCcQ`JCoRPMC{S@PDIB?LC?OPACSP`FC_QPICkR@LCwRpODCS`RdWTW~C?RPMC{S@TDYNp@C[QpNDKTPUD]TPUD[UHTDWTpWDeNp?CsR`ND?TPUD[U@XDiNp@C[QpNDKTPUD[U@XDgUwq_??OA?K@?D?W@oG?cA_JBCK`\bGVP]dsV`^_??OA?K@?D?W@oG?cA_JBCVP]D{WH\DwVp_EE??@?G?oC?SOw??C?_B?O@PBEM??@?G?oC?SOpbEQOpbEOXXbEOXPeeKX@dEWXxBEKX@dEWXpgeKX@dEWXpgEeWpcESX`fE_YPi_??OA?K@?D?W@oG?cA_JBKLG??C?_B?O@OE?[A?H?gAosEq??@?G?oC?S@_F?_AOI?kL@dE_YpkEu??@?G?oC?S@_F?_AOI?kL@kEsZg??C?_B?O@OE?[A?H?gAosC?O`GCoS@SEoZPmE}KpkEsZ`nFAZ@lEwZpoFEXPgEkZ@lEwZpoFC[hkEsZ`nF?[PqFMO@AC_R@ODOZ@lEwZpoFC[`rFQKphEgYpkEsZ`nF?[PqFK\@tecY`jEoZPmE{[@pFG[psFS\hdE_YPiEkZ@lEwZpoFC[`rFO\PuF]YPiEkZ@lEwZpoFC[`rFO\PuF[]H?CGQ@KD?T@hEgYpkEsZ`nF?[PqFK\@tFW\pwFeKpkEsZ`nF?[PqFK\@tFW\pwFc]hkEsZ`nF?[PqFK\@tFW\pwFc]`zeSY@jEoZPmE{[@pFG[psFS\`vF_]PyFk^HkEsZ`nF?[PqFK\@tFW\pwFc]`zFo^X?CGQ@KD?T@kEsZ`nF?[PqFK\@tFW\pwFc]`zFo^P}bKOPADCS`RDOZ@lEwZpoFC[`rFO\PuF[]@xFg]p{Fs^`~cCO`PDGSpSEoZPmE{[@pFG[psFS\`vF_]PyFk^@|Fw^q?cCO`PDGSpSESY@jEoZPmE{[@pFG[psFS\`vF_]PyFk^@|Fw^q?GEOPADCS`RDOZ@lEwZpoFC[`rFO\PuF[]@xFg]p{Fs^`~G?_QAc?OPAC_R@ODCS`RDOZ@lEwZpoFC[`rFO\PuF[]@xFg]p{Fs^`~G?_QAGM??@?G?oE?[A?H_??OA?W@oGGU??@?G@_F?_Np?CsR`ND?TpZGS`g??K@_HGS`aFgS`aFGaNp?CsR`ND?TpZGS`aFG_aW??K@_HGS`aFG_aQIgS`aFG_aQIGmNp?CsR`ND?TpZGS`aFG_aQIGkbIN?G@OG?kbiMG}?_D?_AqMG{cGM?{E?zByB_WBwcgM@_NaQHMB_WBwcaRHQBozHGcqSHUcaRHOdQUhGcqSHSdaVhGcqSHSdaVHaBozCOcaRHOdQUH[eAXcOcaRHOdQUH[eAXHiPAQHKdATHWdqWHceaZcOcaRHOdQUH[eAXHgeq[_{MqQHKdATHWdqWHceaZHofYQHKdATHWdqWHceaZHofQ]hGcqSHSdaVH_eQYHkfA\HwfyQHKdATHWdqWHceaZHofQ]H{gIbEs[PqFK\@tF[^A@e{[PqFK\@tFc^aBIMO@AC_R@ODO[@pFG[psFS]`~GOgqces[`vFk^@|Fw^q@IKhAde{\@xFk^@|Fw^qBIKhAdIYO@AC_R@ODO[@tFg]p{Fs^`~GOgqcIShafcCO`PDGSpSEs[`vFo_A@GG_qCIKhAdIWhqgcCO`PDGSpSE{\@xFw_A@GG_qCIKhAdIWhqgIeO@@CGQ@KD?SPQDKT@oFS]`~G?_QAGK`AbIOhQeI[iAhIi??@?G?oE?[A?HGS`aFG_aw??C?_E?[A?~C?RPMC{S@VDk`QEG[aaLIq??B?WAQDG_aqKGsjAlb{O@LCwRpOD[UqFGgaqKGsjAlIy??@?G?oC?S@_F?_AOI?kIOkBY??@?G?oC?S@_F?_AOI?kIOkJA??@?G?oC?S@_F?_AOI?kIOkHwfq_ICkApacLaoJCkgE?[A?H?gAohJ?kQqJMIQ]H{gA`J?kQqJKlGhBWaqKGsjanJ?kQqJKlAt_W@oG?cA_JAcaqKGsjanJ?kQqJKlAtJYIQJGobQ]H{gA`IwjqoJCkarJOlQuJ]??@?G?oC?S@_F?_AOI?kJ?uH?cQoJCkarJOlQuJ[mG??C?_B?O@OE?[A?H?gAokH?cQoJCkarJOlQuJ[mAx_??OA?K@?D?W@oG?cA_JAocAPHwfq_ICkApJGkqsJSlavJ_mQybWcAPJ?kQqJKlAtJWlqwJcmaz_W@oG?cA_JH?cQoJCkarJOlQuJ[mAxJgmq{h?cQ]H{gA`J?kQqJKlAtJWlqwJcmazJonWuGkbALH?cQmI{kApJGkqsJSlavJ_mQyJknA|Jy@_F?_AOI?kaqKGscAPIwjqoJCkarJOlQuJ[mAxJgmq{Jsna~gkbALH?cQ]H{gA`IwjqoJCkarJOlQuJ[mAxJgmq{Jsna~KA??@?G?oC?S@_F?_AOI?kJ?uHSeQ\ICkApJGkqsJSlavJ_mQyJknA|Jwnr?KE??@?G?oC?S@_F?_AOI?kJATHcfQ`J?kQqJKlAtJWlqwJcmazJonQ}J{oB@KI??@?G?oC?S@_F?_AOI?kJATHcfQ]H{gA`J?kQqJKlAtJWlqwJcmazJonQ}J{oB@KGowuHSeQ\ICkApJGkqsJSlavJ_mQyJknA|Jwnr?KCobBKQ@_F?_AOI?kdQXHsgQoJCkarJOlQuJ[mAxJgmq{Jsna~K?oRAKKpBDhSeQ\Hwfq_ICkApJGkqsJSlavJ_mQyJknA|Jwnr?KCobBKOpREbWaqKGsdQXHsgQmI{kApJGkqsJSlavJ_mQyJknA|Jwnr?KCobBKOpREK]@_F?_AOI?kaqKGsdQXHsgQmI{kApJGkqsJSlavJ_mQyJknA|Jwnr?KCobBKOpREK[qIJGobQTHcfQ]H{gA`IwjqoJCkarJOlQuJ[mAxJgmq{Jsna~K?oRAKKpBDKWprGKe??@?G?oC?S@_F?_AOI?kG?oB_OpbEWYW??C?_B?O@OE?[A?H?gAo_B?MADG_aqkIwqw??C?_B?O@OE?[A?H?gAo_B?M@UDgqrK_??OA?K@?D?W@oG?cA_JA?K?wEs[`vFo_QbIWiRJKorW??C?_B?O@OE?[A?H?gAo_B?MBJKorRM_??OA?K@?D?W@oG?cA_JA?M?xCKWpeEcqrKKsrbN_??OA?K@?D?W@oG?cA_JA?M?xGSaAJIojbJKorRMK{sG??C?_B?O@OE?[A?H?gAo_B_MPUDgqrKKsrbNL?sW??C?_B?O@OE?[A?H?gAo_B_MPlFG\p{GCgqeIcqrKKsrbNL?sRQ_??OA?K@?D?W@oG?cA_JA?M?xKkrBLKwrrOLCsbRa?M@BDST`VD_WpeEcqrKKsrbNL?sRQLKtG??K@_HA?M@TDWTpWGSaAJIojbJKorRMK{sBPLGsrSLUG?wDST`VD_UbJKorRMK{sBPLGsrSLStg_B_TPUD[U@lFG\p{GCgqeIcqrKKsrbNL?sRQLKtBTLWtw_B_TPUD[UBJKorRMK{sBPLGsrSLStbVLa??@?G?oC?S@_F?_AOI?kK@BDSUPbEWYRJKorRMK{sBPLGsrSLStbVL_uW??C?_B?O@OE?[A?H?gAooDSUQDG_aqkIwqrKKsrbNL?sRQLKtBTLWtrWLcug??C?_B?O@OE?[A?H?gAooDST`XDgqrKKsrbNL?sRQLKtBTLWtrWLcubZ_??OA?K@?D?W@oG?cA_JB?TPXEs[`vFo_QbIWiRJKorRMK{sBPLGsrSLStbVL_uRYLkvG??C?_B?O@OE?[A?H?gAooDSURJKorRMK{sBPLGsrSLStbVL_uRYLkvB\_??OA?K@?D?W@oG?cA_JBcOpTDcWpeEcqrKKsrbNL?sRQLKtBTLWtrWLcubZLovR]_??OA?K@?D?W@oG?cA_JBcTPXGSaAJIojbJKorRMK{sBPLGsrSLStbVL_uRYLkvB\Lwvw??C?_B?O@OE?[A?H?gAoxDST`XDgqrKKsrbNL?sRQLKtBTLWtrWLcubZLovR]L{wG??C?_B?O@OE?[A?H?gAoxDSUPlFG\p{GCgqeIcqrKKsrbNL?sRQLKtBTLWtrWLcubZLovR]L{wB`_??OA?K@?D?W@oG?cA_JBcTPXKkrBLKwrrOLCsbRLOtRUL[uBXLgur[Lsvb^M?wRacKTPUD[U@XEKX`hKkrBLKwrrOLCsbRLOtRUL[uBXLgur[Lsvb^M?wRaMM??B?WAPTDWTpWDc`QGGkjAmKkrBLKwrrOLCsbRLOtRUL[uBXLgur[Lsvb^M?wRaMKxHTDWTpWDcUbJKorRMK{sBPLGsrSLStbVL_uRYLkvB\Lwvr_MCwbbMOxXTDWTpWDcZPqF[^A@IKhahKkrBLKwrrOLCsbRLOtRUL[uBXLgur[Lsvb^M?wRaMKxBdMYTPUD[U@XKkrBLKwrrOLCsbRLOtRUL[uBXLgur[Lsvb^M?wRaMKxBdMWxxdE_YPiEkZ`rFW\pwFc]`|GIYPiEk\`vF_]PyHwfq_ICkatJ_mq}KCpBFKgyXdE_YPiEkZ`rFW\pwFc]`|GG`aHGoyRiecY`jFW\pwFc]aEGcbA]H{gA`JGlQwJknb@KOprIMcybjeSY@hEgYpmFK\`vF_]PyFs_aJGobQmI{lavJ_nr?KCqBHKgyRiMkzHhEgYpuF[]@xFgaqKGsfa^I?gQmI{katJWlqwJkna~K?oRCK[qBHKgyRiMkzBleSY@jEw[pwFs_aTHcfQ`KGorCKSpbFK_qRIMcybjMozRmhSeQ\Hwfq_ICkatJ_mq}KCobBKOpREK[qBHKgyRiMkzBlMwzxdE_YpmFK]@|GG`aHGodQXHsgRAKKpBDKWprGKcqbhMgyrkMszbnNA`aHGodQXHsfa^I?gQqJSmAzJwoRAKKpBDKWprGKcqbhMgyrkMszbnN?{XdE_YpmFK]@|GGaqKGsdQXHsgQmI{lavJ_nr?KCobBKOpREK[qBHKgyRiMkzBlMwzroNC{iJGobQTHcfQ]H{gA`IwjqqJSlavJ_mq}J{oB@KGorCKSpbFK_qRIMcybjMozRmM{{BpNG{x\EA|X\E?|RunS|bvdsWBtNW|rwnS|bvN_}X\E?|RuN[}BxNi|RuN[}BxNg}wNAOM_zBscaUHgfgNAOI?zBscaUHgfb|_??OA?K@?D?W@oG?cA_JAcJ?uJ?kQqJKlAtJWlqwJcmazJonrAKKpBDKa??@?G?oC?S@_F?_AOI?kIOkJ?kQqJKlAtJWlqwJcmazKGorCN}??@?G?oC?S@_F?_AOI?kIOkHwfq_ICkApJGkqsJSlavJ_mQyJknb@KGorCK[qbiMozboNG|B~OA?_D?_AohBWbqPJ?kQqJKlAtJWlqwJcnA~KGpRGN|?C@_G?oC?SA?H?gAohG{cQoJCkarJOlQuJ[mB~O@?SA_G@OG?kIQNHCfa^I?gQoJCkarJOlQuJ[mAzJwoRCK[qbiMozboNG|B~O@?SAOMIOuGkbALIwjqoJCkarJOlQuJ[mAxJonr?KCobDK_qRIMszbrNO~s?OD?cBOQ?oC?SAOI?kIQJGobQmI{kApJGkqsJSlavJ_nr?KCqBHKgzRmNK|B~O@?SAOL@CDacaqKGsfa^I?gQmI{kApJGkqsJSlavJ_mq}J{oB@KOprGKcqbiMozRmN?{brNO~s?OD?cBOP@SE_??OA?K@?D?W@oG?cA_JAoLaoJCkarJWmQyJknA~KGorCKSqB~O@?SAOL@CDOX@w??C?_B?O@OE?[A?H?gAokJ?kQqJcmazKGorCN|?C@OH?sCOT@cFOa??@?G?oC?S@_F?_AOI?kJA]H{gA`J?kQqJSmAxJgmq}KCobBKOprIMgzBmN?{bsN|?C@OH?sCOT@cFO`AW@?G@?D?[A?I?kLaNHCkArJWmQ{J{obDK_~s?OD?cBOP@SEO\ACHOi?OA?K@?D?[A?H?gAqNHC~s?OD?cBOP@SEO\ACHOhAw@?G@?D?[A?I?kbqPHwfq_ICkatJ_mq}KCpBFKgybkMw{BqNO~s?OD?cBOP@SEO\ACHOhAsK_C@?F?gLaJGobQmI{kArJWlqwJcnA~K?oRAKSqBHKgzRmNK|B~O@?SAOL@CDOX@sGOdAcJOpBW@?K@?D?[AOI?kaqKGsjanJWlqwJ{oB@K_qRIMszbrNO~s?OD?cBOP@SEO\ACHOhAsKOtBg@?O@oIGkbALHwfq_ICjanJGlQuJ[mAzJwnr?KCpBFK_qRIMgzBlMw{BqNK|B~O@?SAOL@CDOX@sGOdAcJOpBSMO}??@?G?oC?S@_F?_AOI?kJ?uHSeQ\ICkApJGkquJcmazJonrAKKpBDKWprGKcqbnN?{RqNK|B~O@?SAOL@CDOX@sGOdAcJOpBSMO|CG??C?_B?O@OE?[A?H?gAokHSeQ\ICkApJGmQyJkobBKOpREK[qBHKgzroNC{brNO~s?OD?cBOP@SEO\ACHOhAsKOtBcNP@CW??C?_B?O@OE?[A?H?gAokHSeQ\Hwfq_ICkApJGlQwJcmazJwoRAKKpBDKWprGKcqbiMozbnN?{RqNK|B~O@?SAOL@CDOX@sGOdAcJOpBSMO|CCPPI?_D?_AouG{cQTHcfQ`J?kquJcnA~KGorCKSpbFK_qRIM{{BpNG{rsN|?C@OH?sCOT@cFO`ASIOlBCLOxBsOPDCcR_G?oC?SA?H?gAqNHCdQXHsgRAKKpBDKWprGKcqbnN?{RqNK|B~O@?SAOL@CDOX@sGOdAcJOpBSMO|CCPPHCsS_G@OG?kbqPHSeQ\Hwfq_ICkatJ_mq}KCobBKOpREK[qBHKgybkMwzroNC{brNO~s?OD?cBOP@SEO\ACHOhAsKOtBcNP@CSQPLDCTbWaqKGsdQXHsgQmI{kArJWlqwJcnA~K?oRAKKpBDKWprGKcqblMwzroNC{brNO~s?OD?cBOP@SEO\ACHOhAsKOtBcNP@CSQPLDCTPY?oC?SAOI?kaqKGsdQXHsgQmI{lavJ_nr?KCobBKOpREK[qBHKgzRmM{{BpNG{rsN|?C@OH?sCOT@cFO`ASIOlBCLOxBsOPDCcRPPDSUP]aqKGsdQXHsfa^I?gQmI{katJWlqwJkna~K?oRAKKpBDKWprGKcqbiMozRmM{{BpNG{rsN|?C@OH?sCOT@cFO`ASIOlBCLOxBsOPDCcRPPDSUP\EN

To colorize the graph, use following implementation:

    G = nx.read_sparse6("graph_file")
    
    for i in range(0,12):
        G.nodes[i]["color"] = "yellow"
    for i in range(12,40):
        G.nodes[i]["color"] = "red"
    for i in range(40,63):
        G.nodes[i]["color"] = "turquoise"
    for i in range(63,85):
        G.nodes[i]["color"] = "green"
    for i in range(85,163):
        G.nodes[i]["color"] = "blue"
    for i in range(163, 176):
        G.nodes[i]["color"] = "magenta"
    for i in range(176, 282):
        G.nodes[i]["color"] = "orange"  
1

There are 1 answers

0
David Eisenstat On BEST ANSWER

Formally, we have an independent set problem with some covering constraints on the side. Integer programming seems well suited, though I don’t know which R specifically were giving you trouble.

The straightforward integer programming formulation of independent set has a 0–1 variable for each node and a constraint for each edge. The linear relaxation is poor, however, since (for example) a triangle graph has a fractional solution where each node is 0.5, for a “1.5-node” fractional independent set. Since G has only a few hundred maximal cliques, however, we can use a better formulation with a constraint for each maximal clique (for each clique, the independent set contains at most one clique node).

Implementation using NetworkX and OR-Tools below:

import networkx as nx

G = nx.read_sparse6("graph_file")

for i in range(0, 12):
    G.nodes[i]["color"] = "yellow"
for i in range(12, 40):
    G.nodes[i]["color"] = "red"
for i in range(40, 63):
    G.nodes[i]["color"] = "turquoise"
for i in range(63, 85):
    G.nodes[i]["color"] = "green"
for i in range(85, 163):
    G.nodes[i]["color"] = "blue"
for i in range(163, 176):
    G.nodes[i]["color"] = "magenta"
for i in range(176, 282):
    G.nodes[i]["color"] = "orange"


R = {
    "red": 11,
    "turquoise": 11,
    "blue": 8,
    "orange": 6,
    "green": 4,
    "magenta": 2,
    "yellow": 1,
}


import collections
from ortools.linear_solver import pywraplp

solver = pywraplp.Solver.CreateSolver("SCIP")
x = {v: solver.BoolVar(str(v)) for v in G.nodes}
for K in nx.find_cliques(G):
    solver.Add(sum(x[v] for v in K) <= 1)
nodes_with_color = collections.defaultdict(list)
for v in G.nodes:
    nodes_with_color[G.nodes[v]["color"]].append(v)
for color, bound in R.items():
    solver.Add(sum(x[v] for v in nodes_with_color[color]) >= bound)
solver.Maximize(sum(x_v for x_v in x.values()))
status = solver.Solve()
if status == pywraplp.Solver.OPTIMAL:
    solution = [v for (v, x_v) in x.items() if x_v.SolutionValue()]
    print(*solution)
    print(collections.Counter(G.nodes[v]["color"] for v in solution))
elif status == pywraplp.Solver.INFEASIBLE:
    print("no solution")
else:
    print("unknown status", status)