Spark's takeSample() results in two stages

258 views Asked by At

I've observed interesting behavior in Spark 1.3.1, the reason for which is not clear.

Doing something as simple as sc.textFile("...").takeSample(...) always results in two stages:

enter image description here

1

There are 1 answers

1
Justin Pihony On BEST ANSWER

I was able to reproduce this and the key is to focus on the details expansion. The first and second have different line numbers for their call within takeSample. The first is Line 428, which is a call to count, thus why this triggers on its own. The second is Line 447, which is the call to sample itself. This might be confusing and could possibly be fixed, but I wouldn't imagine it to be a high priority.