I am working on Hadoop for my master thesis, Hadoop 1.1.2.
I am studying a new algorithm for speculative task and so in this first step i m trying to apply some changes in the code.
Sadly, also using 2 node, i cannot cause the speculative execution. I wrote some lines of code as Log in the class DefaultTaskSelector (this is the class for speculative task), but this class, after the initialization, is never called by the FairScheduler class.
I activated the option "speculative" in the config file too (mapred-site...xml) but nothing.
So the question is: How can i cause/force the speculative execution?
Regards
Speculative execution typically happens when there are multiple mappers running and one or more of them lag the others. A good way to get it to happen:
Now you may see speculative execution run.
If not, feel free to get back here. I can provide further suggestions (e.g. making some moderately complicated queries that would likely induce SE)
EDIT
Hive may be a bit of a stretch for you. But you can apply the "spirit" of the strategy to regular HDFS files as well. Write a map/reduce program with a custom partitioner that is intentionally skewed: i.e. it causes a single mapper to do an outsized proportion of the work.
Remember to have some tens of hdfs blocks (at least) to give the task trackers some decent amount of work to chew on.