Create dictionary from a list of lists by using maxheap in Python

161 views Asked by At

What I know is that creating a maxheap in Python is below: The syntax of heapq.heappush(heap, value)

li = [5, 7, 9, 1, 3]

maxheap = [-val for val in li]
heapq.heapify(maxheap)

heapq.heappush(maxheap, -10)         
print('maxheap = ', maxheap)       # [-10, -7, -9, -1, -3, -5]

I found a famous interview question and one of its solution.

"Given a list of the scores of different students, items, where items[i] = [IDi, scorei] represents one score from a student with IDi, calculate each student's top five average. Return the answer as an array of pairs result, where result[j] = [IDj, topFiveAveragej] represents the student with IDj and their top five average. Sort result by IDj in increasing order. A student's top five average is calculated by taking the sum of their top five scores and dividing it by 5 using integer division."

I saw one of solutions using maxheap and a part of it is below.

# items = [[1,91],[1,92],[2,93],[2,97],[1,60],[2,77],[1,65],[1,87],[1,100],[2,100],[2,76]]

def highFive(self, items):
    seen = defaultdict(list)
    
    for s_id, score in items:
        heapq.heappush(seen[s_id], -score)
    
    # output seen is {1: [-100, -91, -92, -65, -87, -60], 2: [-100, -97, -77, -93, -76]})
    print("seen = ", seen) 
    ...

This short code is mysterious for me (I'm a Java Dev. and started learning Python recently and my question is not about the algorithm of this solution ).

I know how to create a maxhep for a given integer list and how to create dictionary using defaultdict respectively, but don't understand how this short code doing altogether without even writing some codes which are used when create a dictionary and heap?

First, this code doesn't even write heapq.heapify(seen) to create maxheap. Simply create default empty dictionary seen, and then looping input items and directly heapq.heappush(seen[s_id], -score)

Second, I printed out seen[s_id] in each iteration. How come dictionary is created after seen[s_id] even without append() method like seen[s_id].append(score)?

for s_id, score in items:
    id_ = seen[s_id]
    print("\nseen[{0}] = {1}".format(s_id, id_))

    heapq.heappush(id_, -score)         
    ...

output: 
seen[1] = []

seen[1] = [-91]

seen[2] = []

seen[2] = [-93]

seen[1] = [-92, -91]

seen[2] = [-97, -93]

seen[1] = [-92, -91, -60]

seen[1] = [-92, -91, -60, -65]

seen[1] = [-92, -91, -60, -65, -87]

seen[2] = [-97, -93, -77]

seen[2] = [-100, -97, -77, -93]
1

There are 1 answers

0
Stef On

First, this code doesn't even write heapq.heapify(seen) to create maxheap. Simply create default empty dictionary seen, and then looping input items and directly heapq.heappush(seen[s_id], -score)

heapq.heapify(myList) changes the order of the elements in the list myList so that it satisfies the properties of a heap. However, if myList is an empty list, then this operation is not needed. The empty list already satisfies the properties of a heap.

Since seen is a defaultdict and not just a regular dict, an entry for seen[s_id] is created automatically, and initialised to an empty list, if this entry doesn't exist already.

Second, I printed out seen[s_id] in each iteration. How come dictionary is created after seen[s_id] even without append() method like seen[s_id].append(score)?

Note that heapq.heappush(seen[s_id], score) and seen[s_id].append(score) are almost equivalent. The only difference is that append adds the element at the end of the list, whereas heapq.heappush adds the element at some position so that the list still satisfies the properties of a heap.

As an illustration, the three following codes produce the same result:

li = [5, 7, 9, 1, 3]

maxheap = [-val for val in li]
heapq.heapify(maxheap)
li = [5, 7, 9, 1, 3]

maxheap = []
for val in li:
  maxheap.append(-val)
heapq.heapify(maxheap)
li = [5, 7, 9, 1, 3]

maxheap = []
for val in li:
  heapq.heappush(maxheap, -val)