How can I efficiently randomly select items from a dictionary that meet my requirements?

85 views Asked by At

So at the moment, I have a large dictionary of items. Might be a little confusing, but each of these keys have different values, and the values themselves correspond to another dictionary.

I need to make sure that my random selection from the first dict covers all possible values in the second dict. I'll provide a rudimentary example:

Dict_1 = {key1: (A, C), key2: (B, O, P), key3: (R, T, A)} # and so on 

Dict_2 = {A: (1, 4, 7), B: (9, 2, 3), C: (1, 3)}  # etc

I need a random selection of Dict_1 to give me a coverage of all numbers from 1 - 10 in Dict_2 values.

At the moment, I am selecting 6 random keys from Dict_1, taking all the numbers that those letters would correspond with, and comparing that set to a set of the numbers from 1 - 10. If the selection isn't a subset of 1 - 10, select 6 more random ones and try again, until I have 1 - 10.

Now, this works, but I know it's far from efficient. How can I improve this method?

I am using Python.

1

There are 1 answers

0
Stef On

At the moment, I am selecting 6 random keys from Dict_1, taking all the numbers that those letters would correspond with, and comparing that set to a set of the numbers from 1 - 10. If the selection isn't a subset of 1 - 10, select 6 more random ones and try again, until I have 1 - 10.

Now, this works, but I know it's far from efficient. How can I improve this method?

In case your solution doesn't fully cover 1-10, you're erasing the whole solution and restarting completely from scratch.This is what's inefficient.

Instead, you could use an approach inspired by simulated annealing or random nearest neighbour search. The idea is that if your solution doesn't fully cover 1-10, then instead of erasing it, you try to incrementally make it better.

One way to do this is to attribute a score to each of the six keys in your solution. This score should reflect how useful that key is in the solution; i.e., how many numbers in 1-10 are covered thanks to this key that are not already covered by another key.

Then, instead of picking six new random keys, you keep the five best keys and only pick one new random key. The solution should become incrementally better, until hopefully it covers the whole range 1-10.

import random

keylist1 = ['key{}'.format(n) for n in range(100)]
keylist2 = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
cover_range = range(1,21)  # 1-20 instead of 1-10 otherwise the problem is too simple

d1 = {k: random.choices(keylist2, 3) for k in keylist1}
# d1 = {'key0': ['N', 'C', 'L'], 'key1': ['P', 'N', 'M'], 'key2': ['I', 'G', 'Q'], 'key3': ['F', 'M', 'R'], 'key4': ['L', 'P', 'U'], 'key5': ['V', 'Q', 'L'], 'key6': ['R', 'W', 'K'], 'key7': ['T', 'S', 'I'], 'key8': ['W', 'M', 'T'], 'key9': ['A', 'K', 'Q'], 'key10': ['J', 'I', 'L'], 'key11': ['F', 'X', 'D'], 'key12': ['L', 'J', 'B'], 'key13': ['A', 'W', 'I'], 'key14': ['L', 'R', 'Y'], 'key15': ['V', 'O', 'Z'], 'key16': ['G', 'U', 'B'], 'key17': ['R', 'G', 'S'], 'key18': ['X', 'C', 'V'], 'key19': ['S', 'F', 'Z'], 'key20': ['J', 'S', 'L'], 'key21': ['E', 'P', 'X'], 'key22': ['L', 'X', 'E'], 'key23': ['B', 'L', 'O'], 'key24': ['B', 'T', 'W'], 'key25': ['H', 'V', 'Y'], 'key26': ['J', 'T', 'C'], 'key27': ['M', 'G', 'A'], 'key28': ['I', 'E', 'P'], 'key29': ['L', 'R', 'N'], 'key30': ['V', 'J', 'B'], 'key31': ['I', 'V', 'T'], 'key32': ['E', 'N', 'W'], 'key33': ['W', 'D', 'M'], 'key34': ['E', 'Q', 'P'], 'key35': ['C', 'Z', 'A'], 'key36': ['T', 'X', 'O'], 'key37': ['B', 'D', 'J'], 'key38': ['N', 'M', 'D'], 'key39': ['E', 'B', 'A'], 'key40': ['A', 'B', 'K'], 'key41': ['Z', 'B', 'O'], 'key42': ['G', 'L', 'A'], 'key43': ['P', 'N', 'H'], 'key44': ['Z', 'W', 'M'], 'key45': ['K', 'A', 'J'], 'key46': ['O', 'B', 'L'], 'key47': ['J', 'Z', 'F'], 'key48': ['C', 'D', 'O'], 'key49': ['F', 'B', 'J'], 'key50': ['H', 'V', 'T'], 'key51': ['A', 'L', 'O'], 'key52': ['N', 'T', 'Q'], 'key53': ['F', 'N', 'D'], 'key54': ['K', 'W', 'V'], 'key55': ['A', 'M', 'E'], 'key56': ['Z', 'J', 'A'], 'key57': ['S', 'B', 'W'], 'key58': ['D', 'S', 'P'], 'key59': ['E', 'Y', 'H'], 'key60': ['C', 'S', 'Y'], 'key61': ['L', 'P', 'M'], 'key62': ['H', 'S', 'N'], 'key63': ['S', 'U', 'J'], 'key64': ['J', 'N', 'R'], 'key65': ['E', 'B', 'W'], 'key66': ['B', 'V', 'Q'], 'key67': ['K', 'V', 'L'], 'key68': ['N', 'Z', 'H'], 'key69': ['O', 'U', 'E'], 'key70': ['E', 'W', 'H'], 'key71': ['W', 'P', 'A'], 'key72': ['G', 'W', 'X'], 'key73': ['Z', 'D', 'Q'], 'key74': ['S', 'Y', 'P'], 'key75': ['C', 'A', 'I'], 'key76': ['E', 'V', 'S'], 'key77': ['F', 'M', 'T'], 'key78': ['L', 'E', 'S'], 'key79': ['E', 'T', 'J'], 'key80': ['J', 'Y', 'A'], 'key81': ['I', 'F', 'G'], 'key82': ['D', 'S', 'L'], 'key83': ['F', 'E', 'P'], 'key84': ['X', 'L', 'T'], 'key85': ['H', 'U', 'M'], 'key86': ['W', 'A', 'C'], 'key87': ['Z', 'L', 'K'], 'key88': ['Y', 'N', 'X'], 'key89': ['F', 'K', 'B'], 'key90': ['Q', 'G', 'W'], 'key91': ['U', 'O', 'W'], 'key92': ['N', 'C', 'L'], 'key93': ['O', 'V', 'P'], 'key94': ['D', 'Y', 'R'], 'key95': ['S', 'K', 'I'], 'key96': ['G', 'Y', 'R'], 'key97': ['T', 'Z', 'G'], 'key98': ['C', 'A', 'Q'], 'key99': ['H', 'I', 'W']}
d2 = {c: random.sample(cover_range, 3) for c in keylist2}
# d2 = {'A': [14, 10, 17], 'B': [11, 20, 15], 'C': [11, 9, 8], 'D': [6, 18, 19], 'E': [18, 7, 1], 'F': [9, 14, 12], 'G': [17, 18, 20], 'H': [17, 12, 8], 'I': [17, 7, 5], 'J': [8, 20, 5], 'K': [17, 7, 13], 'L': [1, 18, 20], 'M': [5, 8, 18], 'N': [15, 17, 10], 'O': [16, 20, 18], 'P': [2, 18, 7], 'Q': [11, 17, 6], 'R': [3, 15, 4], 'S': [5, 15, 6], 'T': [6, 15, 20], 'U': [20, 12, 8], 'V': [20, 16, 3], 'W': [2, 16, 1], 'X': [5, 11, 1], 'Y': [2, 9, 8], 'Z': [6, 3, 16]}
import random
from collections import Counter
from itertools import chain

def random_solution():
    solution = set(random.sample(keylist1, 6))
    coverage = Counter(chain.from_iterable(d2[c] for k in solution for c in d1[k]))
    while len(coverage) < len(cover_range):
        #print(solution, '    ', sorted(coverage.keys()))
        scores = {k: sum(1/coverage[n] for n in frozenset().union(*(d2[c] for c in d1[k])) ) for k in solution}
        #print(scores)
        worst_key = min(solution, key=scores.get)
        solution.remove(worst_key)
        while len(solution) < 6:
            solution.add(random.choice(keylist1))  # in while loop just because the new random key might be one of the 5 keys we already had, if we're unlucky
        coverage = Counter(chain.from_iterable(d2[c] for k in solution for c in d1[k]))
    return solution