How to efficiently find top-k elements?

Question

How to efficiently find top-k elements?

780 views Asked by HHH At 10 June 2015 at 16:52

I have a big sequence file storing the tfidf values for documents. Each line represents line and the columns are the value of tfidfs for each term (the row is a sparse vector). I'd like to pick the top-k words for each document using Hadoop. The naive solution is to loop through all the columns for each row in the mapper and pick the top-k but as the file becomes bigger and bigger I don't think this is a good solution. Is there a better way to do that in Hadoop?

Original Q&A

There are 1 answers

**KrazyGautam** · Answer 1 · 2017-05-30T14:49:19+00:00

 1. In every map calculate TopK (this is local top K for each map)
 2. Spawn a signle reduce , now top K from all mappers will flow to this reducer and hence global Top K will be evaluated.

Think of the problem as

 1. You have been given the results of X number of horse races. 
 2. You need to find Top N fastest horse.

TechQA.

How to efficiently find top-k elements?

There are 1 answers

Related Questions in HADOOP

Related Questions in MAPREDUCE

Related Questions in TF-IDF

Popular Questions

Popular Tags

Trending Questions