In Python dict
and set
are hash-based hence need hashable key type.
In C++ there are also hash-based std::unordered_map and std::unordered_set.
But C++ has variants of these structures that don't need keys to be hashable - std::map and std::set. It is enough for keys to support probably just one comparison operation less <
(also maybe equality needed ==
). Also these structures enumerate their keys in sorted order which is a very nice property unlike unordered variants.
These std::map
/std::set
finds/inserts key in O(log N)
amortized time, unlike hashed structures which do this in O(1)
time.
Although hash-based variants are faster for big enough data, they need keys to be hashable types, but in python list
/dict
and nested structures are unhashable hence can't be dict/set keys without extra conversion, e.g. we can convert list/dict keys to strings using str_key = json.dumps({"x": 1, "y": [2,3]}, sort_keys = True)
or to bytes using bytes_key = pickle.dumps({"x": 1, "y": [2,3]})
(but probably pickled variant produces same bytes only for really-really equal structures, haven't tested).
Is there any standard or popular non-standard (i.e. through pip
modules) support for non-hash-based variant of dict
/set
in Python? Same like C++
's std::map
/std::set
implementations?
Probably these non-hash-based variants can be not to difficultly modeled through keeping sorted list of (key, value) pairs (sorted by key), but is there a ready-made code for this? Also just unsorted list can be used too by finding key position by lst.index(key)
, but it has find/insert time of O(N)
which is very-very slow for quite large data.
Just for example and fun decided to code dict solution based on sorted list (plus bisect
module) for main dict operations, it is quite efficient (O(log N)
) for search/get operations, but very inefficient (O(N)
) for insertion operations (as it needs moving whole array data one place to the right at insertion points):
class sorted_list_dict:
def __init__(self):
self.ks, self.vs, self.n = [], [], 0
def reserve(self, n):
if len(self.ks) >= n:
return
import math
nn = 1 << math.ceil(math.log(max(1, n)) / math.log(2))
self.ks += [None] * (nn - len(self.ks))
self.vs += [None] * (nn - len(self.vs))
def insertion_index(self, k):
import bisect
return bisect.bisect_right(self.ks, k, 0, self.n)
def __contains__(self, k):
i = self.insertion_index(k)
return i > 0 and self.ks[i - 1] == k
def __getitem__(self, k):
i = self.insertion_index(k)
if i > 0 and self.ks[i - 1] == k:
return self.vs[i - 1]
raise KeyError(i)
def get(self, k, default = None):
try:
return self.__getitem__(k)
except KeyError:
return default
def __setitem__(self, k, v):
i = self.insertion_index(k)
if i > 0 and self.ks[i - 1] == k:
self.vs[i - 1] = v
else:
self.reserve(self.n + 1)
self.ks[i + 1 : self.n + 1], self.vs[i + 1 : self.n + 1], self.n = (
self.ks[i : self.n], self.vs[i : self.n], self.n + 1,
)
self.ks[i], self.vs[i] = k, v
def items(self):
for i in range(self.n):
yield (self.ks[i], self.vs[i])
def test():
import string, random
random.seed(0)
d, sld = {}, sorted_list_dict()
for i in range(10000):
k = string.ascii_letters[random.randrange(len(string.ascii_letters))]
v = random.randrange(0, 10 ** 6)
d[k], sld[k] = v, v
assert d[k] == sld[k], (d[k], sld[k])
assert k in sld and k + '_' not in sld, k
assert d.get(k) == sld.get(k) and d.get(k, '__') == sld.get(k, '__')
assert len(list(d.items())) == len(list(sld.items())), (len(list(d.items())), len(list(sld.items())))
assert set(d.items()) == set(sld.items()), (set(d.items()), set(sld.items()))
test()
Related questions:
- suggested by @parktomatomi about implementations of binary tree
- suggested by @PascalGetreuer about implementing of hash method