In google's dremel, what algorithm that top-k query makes use of?

266 views Asked by At

Google's Dremel algorithm supports top-k queries. Could somebody tell me what algorithm that top-k query makes use of?

2

There are 2 answers

0
lavin On

like a Heap ?

A heap can be used to answer query asking for top k elements in a sorted list, in a O(nlogk) time.

see http://stevehanov.ca/blog/index.php?id=122

0
user152468 On

I guess you know about the Dremel Paper?

Here is a link: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/36632.pdf

It says:

Some Dremel queries, such as top-k and count-distinct, return approximate results using known one-pass algorithms (e.g., [4]).

The reference is the following:

[4] Z. Bar-Yossef, T. S. Jayram, R. Kumar, D. Sivakumar, and L. Trevisan. Counting Distinct Elements in a Data Stream. In RANDOM, pages 1–10, 2002.

Does this help?