I have an issue Ive been looking into for a while and can't figure it out...any help is appreciated,

We run redis, resque and resque-scheduler on Heroku. Many of the scheduled jobs do not get run. The /resque/failed web page shows the error indicating that rails was never loaded in the worker process. Many of the errors do not make it into the /resque/failed page listing. However, many times the jobs to run.

I also see in /resque/workers more workers than I have running in Heroku. I cannot explian this. I kill the workers from the the rails console, restart the Heroku workers and it all looks good but eventually more workers appear in the /resque/workers page. NOTE: these workers do NOT appear as a result of a Heroku restart (which it does every 24 hours). After I saw the appearance of a 3rd worker (I run 2 in Heroku) I did a "heroku ps" which showed the uptime for the 2 Heroku works to be approx. 7 hours.

I suspect that the mechanism that resque-scheduler uses to communicate with waiting and ready worker processes is broken in some way. And that the extra workers are defunct rails processes that are still "seen" by the resque mechanism (i.e., there are entries in Redis for these defunct workers).

More details: -I noticed there were 6 workers displayed in the /resque/workers web page and so I used the Rails console to kill all the workers. Then I restarted the workers in Heroku. That did nothing so I restarted the resque-scheduler process in Heroku. Still no chnage however the /resque/workers page shows the correct number of resque workers (that number matching the number of heroku worker processes). -about 12 hours later I noticed there appeared more workers in the /resque/workers page. WTF? -some jobs do get processed, and I see these in the completed totals for each worker (i.e., all the workers show non-zero num_processed values). -I do see the logs showing sometimes the workers get run.

Is there an obvious issue here? Can someone describe the keys/entries structure in Redis that are relavent to resque-scheduler communicating with the resque workers? I noted resque is no longer compatable with ruby 1.9.2 but I dont see that as a probable cause since the problem is intermittent. And I realize we use v. 1 of resque not v. 2.

I read through this similar looking issue: Resque on Heroku cedar stack Worker count still exists after the worker terminate but that it not my issue (as I dont pass ENV variables on the command line to heroku workers/aka resque worker processes).

Although it may be related to this: https://github.com/resque/resque/issues/319#issuecomment-1789239 I dont pass any ENV vars to my worker's rake task command line.

Errors and Logging info:

For every job scheduled/queued by resque-scheduler I do see a log entry showing its doing it and doing it at the correct time, for example:

    2014-12-08T22:50:00.120565+00:00 app[scheduler.1]: [INFO] 2014-12-08 22:50:00 +0000: queueing CitrixWorker (CitrixWorker)
    2014-12-08T22:50:00.127388+00:00 app[scheduler.1]: [INFO] 2014-12-08 22:50:00 +0000: queueing InstagramFeed (InstagramFeed)
    2014-12-08T22:50:00.124540+00:00 app[scheduler.1]: [INFO] 2014-12-08 22:50:00 +0000: queueing PageSync (PageSync)

Much of the time I never see the worker run. Sometimes I do (again, from the logs):

    2014-12-08T22:50:00.367083+00:00 app[worker.3]: DEBUG CitrixWorker.update_webinars: starting at 2014-12-08 22:50:00 +0000

Example error in /resque/failed: Exception NameError Error uninitialized constant WorkerProcess => indicating Rails was never loaded (or is no longer loaded) in the resque worker process.

Environment and Versions:

Ruby 1.9.2
Rails 3.1.11
the workers and resque-scheduler all run as Heroku Dynos.

Gems: redis-3.0.6 resque-1.25.1 resque-scheduler-2.3.1

Procfile (used by Heroku): web: bin/start-nginx bundle exec unicorn --env $RACK_ENV -c ./config/unicorn.rb worker: bundle exec rake jobs:work scheduler: bundle exec rake resque:scheduler

Rake task for workers:

resque.rake:

    require 'resque/tasks'
    require 'resque/server'
    require 'resque_scheduler/tasks'
    desc 'Called by the resque:work rake task to initialize each worker process (I think)'
    task "resque:setup" => :environment do
      ENV['QUEUE'] = '*' # watch one or all queues (priority of these queues is in alphabetical order when * is specified)
      ENV['INTERVAL'] = '1.0' # lower from default 5 second wait time (polling frequency) 5 gives same erroneous results as 1 second
      puts "INFO: waiting on #{ENV['QUEUE'] == '*' ? 'ALL' : ENV['QUEUE']} Resque queues".info
    end

    desc "Alias for resque:work (To run workers on Heroku using standard worker syntax, which is 'rake jobs:work')"

task "jobs:work" => "resque:work"

-- Mike Papper

0

There are 0 answers