How to find the map-side sort time in Hadoop?

104 views Asked by At

I am investigating the performance of out algorithm that runs on top of Hadoop 2.x. We would like to know how the calculation time breaks down in different pieces: - map time - reduce time - sort time - shuffle time

on the reduce side, there is a clear distinction in the counters: each of the components (reduce, shuffle, merge) has a separate counter. On the map side, there is also a sort taking place, but I cannot find the counters that are related to the sort time/amount. How can I find out the map side sort time?

Thanks.

1

There are 1 answers

1
Ramzy On

You are talking about Map side sort/spill. You can look here for a good presentation on performance, at eash stage of mapreduce. Also in Hadoop Definitve guide, Chapter 6 - How Map reduce works, Shuffle and Sort, Map side, for more theory