How do I combine similar elements in a list in python?

246 views Asked by At

As a practice exercise, I'm trying to generate a color pallette from an image, and am using Python to turn the raw extracted RGB channels into a pallette. This is fairly simple using a combination of lists and dictionaries, but I'd like to be able to put a constraint the the number of colors in the pallette by combining similar color channels, unless they both have a large presence in the image.

Say I have a dictionary with my RGB values and their counts:

countsR = {"255": 301, "250": 16, "10": 589, "14": 124, "8": 132}

This pallette requires 3 bits to index into. Ideally I would run some function, combine_channels(dd, max_distance), which does some terrible O(N^2) things, and outputs something like this:

print(combine_channels(countsR, 10))
>>> {"255": 317, "10": 845}

Which now may be indexed by only one bit!

It would also be good if it could keep track of which things it replaced with what, maybe in another dictionary:

countsR, replacements = combine_channels(countsR, 10)
print(replacements)
>>> {"255": ["250"], "10": ["8", "14"]}

Any ideas on what this may look like?

2

There are 2 answers

5
niemmi On BEST ANSWER

In many cases there's multiple different grouping options as the comments show. One easy option is to iterate through channels in numerical order forming groups and if a channel can't be fitted to existing group create a new one. It won't result to maximal distance but it will guarantee minimum number of groups to be generated:

def combine_channels(channels, dist):
    result = {}
    replacements = {}
    groups = []
    group = []
    key = None

    # Iterate through channels in ascending numerical order
    for channel, count in sorted((int(k), v) for k, v in channels.items()):
        # Add new group in case that channel doesn't fit to current group
        if group and channel - key > dist:
            groups.append((key, group))
            group = []
            key = None

        # Add channel to group
        group.append((channel, count))

        # Pick a new key in case there's none or current channel is within
        # dist from first channel in the group
        if key is None or channel - group[0][0] <= dist:
            key = channel

    # Add last group in case it exists
    if group:
        groups.append((key, group))

    for key, group in groups:
        result[key] = sum(x[1] for x in group)
        replacements[key] = [x[0] for x in group if x[0] != key]

    return result, replacements

countsR1 = {"255": 301, "250": 16, "10": 589, "14": 124, "8": 132}
countsR2 = {"0": 10, "11": 20, "7": 30, "19": 40, "25": 50}
countsR3 = {"0": 5, "11": 10}
print(combine_channels(countsR1, 10))
print(combine_channels(countsR2, 10))
print(combine_channels(countsR3, 10))

Output:

({14: 845, 255: 317}, {14: [8, 10], 255: [250]})
({25: 90, 7: 60}, {25: [19], 7: [0, 11]})
({0: 5, 11: 10}, {0: [], 11: []})

Time complexity of above is O(n log n) since sorting is used.

1
ijustlovemath On

Here's what I came up with:

def combine_channels(lst, max_dist):
    colors = list(set(lst)) # unique colors
    counts = dict()
    replacements = dict()
    all_repl = []

    for el in lst:
        counts[el] = counts.get(el, 0) + 1

    N = len(colors)
    dists = np.zeros((N, N))
    for i in range(N - 1):
        for j in range(i + 1, N):
            dist = abs(colors[i] - colors[j])
            dists[i, j] = dist
            if(dist < max_dist):
                if(colors[i] in all_repl or colors[j] in all_repl):
                    continue
                else:
                    if(counts.get(colors[i], 0) > counts.get(colors[j], 0)):
                        winner = colors[i]
                        loser = colors[j]
                    else:
                        winner = colors[j]
                        loser = colors[i]
                counts[winner] += counts.get(loser, 0)
                counts[loser] = 0
                if winner not in replacements:
                    replacements[winner] = list()
                replacements[winner].append(loser)
                all_repl.append(loser)
                if loser not in replacements:
                    continue
                else:
                    replacements[winner] = replacements[winner].extend(replacements[loser])
                    replacements.pop(loser, None)              
    print(replacements)

There's probably many more efficient ways to do it, and it doesn't satisfy the maximum distance requirement, but it works!