In my notebook, I have setup a utility for logging so that I can debug DSX scheduled notebooks:
# utility method for logging
log4jLogger = sc._jvm.org.apache.log4j
LOGGER = log4jLogger.LogManager.getLogger("CloudantRecommender")
def info(*args):
# sends output to notebook
print(args)
# sends output to kernel log file
LOGGER.info(args)
Using it like so:
info("some log output")
If I check the log files I can see my logout is getting written:
! grep 'CloudantRecommender' $HOME/logs/notebook/*pyspark*
kernel-pyspark-20170105_164844.log:17/01/05 10:49:08 INFO CloudantRecommender: [Starting load from Cloudant: , 2017-01-05 10:49:08]
kernel-pyspark-20170105_164844.log:17/01/05 10:53:21 INFO CloudantRecommender: [Finished load from Cloudant: , 2017-01-05 10:53:21]
However, when the notebook runs as a scheduled job log output doesn't seem to be going to the kernel-pyspark-*.log file.
How can I write log output in DSX scheduled notebooks for debugging purposes?
The logging code actually works ok. The problem was that the schedule was pointing to an older version of the notebook that did not have any logging statements in it!