I'm running a simple jar through spark, everything is working fine, but as a crude way to debug, I often find println
pretty helpful, unless I really have to attach a debugger
However, output from println
statements are nowhere to be found when run under Spark.
The main class in the jar begins like this:
import ...
object SimpleApp {
def main(args: Array[String]) {
println("Starting up!")
...
Why does something as simple as this not show in the driver process.
If it matters, I've tested this running spark locally, as well as under Mesos
update
as Proper way to provide spark application a parameter/arg with spaces in spark-submit I've dumbed down the question scenario, I was actually submitting (with spark-submit
) the command through SSH.
The actual value parameter was a query from the BigDataBenchmark, namely:
"SELECT pageURL, pageRank FROM rankings WHERE pageRank > 1000"
Now that wasn't properly escaped on the remote ssh command:
ssh host spark-submit ... "$query"
Became, on the host:
spark-submit ... SELECT pageURL, pageRank FROM rankings WHERE pageRank > 1000
So there you have it, all my stdout was going to a file, whereas "normal" spark output was still appearing as it is stderr, which I only now realise.
This would appear in the stdout of the driver. As an example see SparkPi. I know on Yarn this appears locally in the stdout when in client mode or in the application master stdout log when in cluster mode. Local mode should appear just on the normal stdout (though likely mixed in with lots of logging noise).