Get feedback from a scheduled job while it is processed

601 views Asked by At

I would like to run jobs, but as they may be long, I would like to know how far they have been processed during their execution. That is, the executor would regularly return its progress, without ending the job it is executing. I have tried to do this with APScheduler, but it seems the scheduler can only receive event messages like EVENT_JOB_EXECUTED or EVENT_JOB_ERROR.

Is it possible to get information from an executor while it is executing a job?

Thanks in advance!

1

There are 1 answers

0
Sopoforic On

There is, I think, no particular support for this within APScheduler. This requirement has come up for me many times, and the best solution will depend on exactly what you need. Some possibilities:

Job status dictionary

The simplest solution would be to use a plain python dictionary. Make the key the job's key, and the value whatever status information you require. This solution works best if you only have one copy of each job running concurrently (max_instances=1), of course. If you need some structure to your status information, I'm a fan of namedtuples for this. Then, you either keep the dictionary as an evil global variable or pass it into each job function.

There are some drawbacks, though. The status information will stay in the dictionary forever, unless you delete it. If you delete it at the end of the job, you don't get to read a 'job complete' status, and otherwise you have to make sure that whatever is monitoring the status definitely checks and clears every job. This of course isn't a big deal if you have a reasonable sized set of jobs/keys.

Custom dict

If you need some extra functions, you can do as above, but subclass dict (or UserDict or MutableMapping, depending on what you want).

Memcached

If you've got a memcached server you can use, storing the status reports in memcached works great, since they can expire automatically and they should be globally accessible to your application. One probably-minor drawback is that the status information could be evicted from the memcached server if it runs out of memory, so you can't guarantee that the information will be available.

A more major drawback is that this does require you to have a memcached server available. If you might or might not have one available, you can use dogpile.cache and choose the backend that's appropriate at the time.

Something else

Pieter's comment about using a callback function is worth taking note of. If you know what kind of status information you'll need, but you're not sure how you'll end up storing or using it, passing a wrapper to your jobs will make it easy to use a different backend later.

As always, though, be wary of over-engineering your solution. If all you want is a report that says "20/133 items processed", a simple dictionary is probably enough.