Best way to check multiple tasks done in Celery?

1.1k views Asked by At

After some research , I found the more problems... Below is more detailed Emxaple:

  1. upload a list of urls, set a job_id to all of them(need generate a dynamically queue name for purging).
  2. use Celery tasks to crawl each url, such as extract.delay(job_id, url) and save to db.
  3. (maybe here are many job --- job1, job2, job3) all tasks in jobs are same extract, just one worker process all the queue(How ? I can not tell all queue name to the worker )
  4. check db select count(id) in xxx where job_id = yyy equal to len(urls), or other way as Celery tell me job_id yyy has done.
  5. show this job status(running or complete) in website, can purge job queue on web.

I have never meet this situation, is celery has some easy way to solve my problem?

I need add a job dynamically, one job contain a lot of tasks. All tasks are same .How can I make different jobs have diferent queue name, and just one worker process all the queues? In programmatically.

1

There are 1 answers

1
rtpg On

I don't know the details of your web app, but this can be pretty straightforward.

(Using Django syntax)

you could make two models/DB tables. One to represent your batch. And one to represent each URL job

class ScrapeBatch(models.Model):
   id = models.IntegerField()

class ScrapeJob(models.Model):
   batch = models.ForeignKey(ScrapeBatch)
   url = models.CharField(max_length=100) # for example
   done = models.BooleanField(default=False)

Then, when you run your celery tasks, use the ScrapeJob model as your reference

def scrape_url_celery_task(job_id):
     job = ScrapeJob.objects.get(id=job_id)
     scrape_url(job)
     job.done=True
     job.save()

So in your webview, you could simply check if all your batch's jobs are done or not:

def batch_done(batch):
    return not batch.scrapejob_set.filter(done=False).exists()

So in summary: - A DB table that holds your URLS - A DB table to hold something like a batch number (with foreign key relations to your URL table)

  • celery marks URLs as scraped in the DB after the task is completed
  • A simple search through the URL table tells you whether the job is done. You can show this value on a website