Unexpected behaviour in Python OrderedSet.issuperset()

170 views Asked by At

I have two OrderedSets and I'm trying to check whether one in a subset of the other - both the elements and their order is important. However, the orderedset package is giving me strange results.

>>> import orderedset
>>> a = orderedset.OrderedSet([433, 316, 259])
>>> b = orderedset.OrderedSet([433, 316, 69])
>>> a.issuperset(b)
True

This doesn't make any sense to me because b contains a value (69) that is definitely not in a. Why is a a superset of b then?

However, when I try this:

>>> c = orderedset.OrderedSet([1, 2, 3])
>>> d = orderedset.OrderedSet([1, 2, 4])
>>> c.issuperset(d)
False

This behaviour seems inconsistent to me: why should the choice of values - [433, 316, 259] versus [1, 2, 3] - in the OrderedSet affect the output of issuperset()?

Perhaps there's a better way to do this? I need to know whether the elements in b are contained in a in the same order. Meaning that, if

a = OrderedSet([433, 316, 259])

I'm looking for partial matches within that set that start with the same starting value as a (433). This is what I want out of it:

OrderedSet([433, 316, 259])
OrderedSet([433, 316]])
OrderedSet([433])

And not:

OrderedSet([433, 259])
OrderedSet([316, 259])
OrderedSet([433, 316, 69])
OrderedSet([433, 259, 316])
OrderedSet([259, 433])
...

Basically, if this really confusing - I have an ordered set and I'm trying to find partial matches both in terms of the values and their order.

1

There are 1 answers

4
Martin Konecny On BEST ANSWER

Presumably you are using this third party module since Python doesn't have a built-in orderedset.

A quick look through the source code on github shows that the issuperset function is implemented as

def issuperset(self, other):
    return other <= self

Taking a look at how the less than or equal operator is defined for orderedset:

def __le__(self, other):
    if isinstance(other, _OrderedSet):
        return len(self) <= len(other) and list(self) <= list(other)

So essentially, when comparing two ordered sets they are first converted to lists, and then the Python built-in <= is used to compare the two lists. When you compare two lists using <=, it's similar to a lexical string comparison meaning it compares matching indices of both lists.

According to their implementation, [433, 316, 259] is a superset of [433, 316, 69] (all matching indices of the first list are greater that or equal of the second list), whereas [433, 316, 259] would not be a superset of [433, 316, 260] (the second list has a bigger value in its last index than the first list).

What they probably wanted to write was

return len(self) <= len(other) and set(self) <= set(other)

Which would use the <= defined for the built-in Python sets, and properly test for subsets and supersets.

You may want to report this issue here.