Python - dict/set-like structures supporting unhashable keys

138 views Asked by At

In Python dict and set are hash-based hence need hashable key type.

In C++ there are also hash-based std::unordered_map and std::unordered_set.

But C++ has variants of these structures that don't need keys to be hashable - std::map and std::set. It is enough for keys to support probably just one comparison operation less < (also maybe equality needed ==). Also these structures enumerate their keys in sorted order which is a very nice property unlike unordered variants.

These std::map/std::set finds/inserts key in O(log N) amortized time, unlike hashed structures which do this in O(1) time.

Although hash-based variants are faster for big enough data, they need keys to be hashable types, but in python list/dict and nested structures are unhashable hence can't be dict/set keys without extra conversion, e.g. we can convert list/dict keys to strings using str_key = json.dumps({"x": 1, "y": [2,3]}, sort_keys = True) or to bytes using bytes_key = pickle.dumps({"x": 1, "y": [2,3]}) (but probably pickled variant produces same bytes only for really-really equal structures, haven't tested).

Is there any standard or popular non-standard (i.e. through pip modules) support for non-hash-based variant of dict/set in Python? Same like C++'s std::map/std::set implementations?

Probably these non-hash-based variants can be not to difficultly modeled through keeping sorted list of (key, value) pairs (sorted by key), but is there a ready-made code for this? Also just unsorted list can be used too by finding key position by lst.index(key), but it has find/insert time of O(N) which is very-very slow for quite large data.

Just for example and fun decided to code dict solution based on sorted list (plus bisect module) for main dict operations, it is quite efficient (O(log N)) for search/get operations, but very inefficient (O(N)) for insertion operations (as it needs moving whole array data one place to the right at insertion points):

Try it online!

class sorted_list_dict:
    def __init__(self):
        self.ks, self.vs, self.n = [], [], 0
    def reserve(self, n):
        if len(self.ks) >= n:
            return
        import math
        nn = 1 << math.ceil(math.log(max(1, n)) / math.log(2))
        self.ks += [None] * (nn - len(self.ks))
        self.vs += [None] * (nn - len(self.vs))
    def insertion_index(self, k):
        import bisect
        return bisect.bisect_right(self.ks, k, 0, self.n)
    def __contains__(self, k):
        i = self.insertion_index(k)
        return i > 0 and self.ks[i - 1] == k
    def __getitem__(self, k):
        i = self.insertion_index(k)
        if i > 0 and self.ks[i - 1] == k:
            return self.vs[i - 1]
        raise KeyError(i)
    def get(self, k, default = None):
        try:
            return self.__getitem__(k)
        except KeyError:
            return default
    def __setitem__(self, k, v):
        i = self.insertion_index(k)
        if i > 0 and self.ks[i - 1] == k:
            self.vs[i - 1] = v
        else:
            self.reserve(self.n + 1)
            self.ks[i + 1 : self.n + 1], self.vs[i + 1 : self.n + 1], self.n = (
                self.ks[i : self.n], self.vs[i : self.n], self.n + 1,
            )
            self.ks[i], self.vs[i] = k, v
    def items(self):
        for i in range(self.n):
            yield (self.ks[i], self.vs[i])

def test():
    import string, random
    random.seed(0)
    d, sld = {}, sorted_list_dict()
    for i in range(10000):
        k = string.ascii_letters[random.randrange(len(string.ascii_letters))]
        v = random.randrange(0, 10 ** 6)
        d[k], sld[k] = v, v
        assert d[k] == sld[k], (d[k], sld[k])
        assert k in sld and k + '_' not in sld, k
        assert d.get(k) == sld.get(k) and d.get(k, '__') == sld.get(k, '__')
    assert len(list(d.items())) == len(list(sld.items())), (len(list(d.items())), len(list(sld.items())))
    assert set(d.items()) == set(sld.items()), (set(d.items()), set(sld.items()))

test()

Related questions:

  1. suggested by @parktomatomi about implementations of binary tree
  2. suggested by @PascalGetreuer about implementing of hash method
0

There are 0 answers