I've observed interesting behavior in Spark 1.3.1, the reason for which is not clear.
Doing something as simple as sc.textFile("...").takeSample(...)
always results in two stages:
I've observed interesting behavior in Spark 1.3.1, the reason for which is not clear.
Doing something as simple as sc.textFile("...").takeSample(...)
always results in two stages:
I was able to reproduce this and the key is to focus on the
details
expansion. The first and second have different line numbers for their call withintakeSample
. The first is Line 428, which is a call tocount
, thus why this triggers on its own. The second is Line 447, which is the call tosample
itself. This might be confusing and could possibly be fixed, but I wouldn't imagine it to be a high priority.