After some research , I found the more problems... Below is more detailed Emxaple:
- upload a list of urls, set a job_id to all of them(need generate a dynamically queue name for purging).
- use Celery tasks to crawl each url, such as
extract.delay(job_id, url)
and save to db. - (maybe here are many job --- job1, job2, job3) all tasks in jobs are same
extract
, just one worker process all the queue(How ? I can not tell all queue name to the worker ) - check db
select count(id) in xxx where job_id = yyy
equal tolen(urls)
, or other way as Celery tell mejob_id yyy
has done. - show this job status(running or complete) in website, can purge job queue on web.
I have never meet this situation, is celery has some easy way to solve my problem?
I need add a job dynamically, one job contain a lot of tasks. All tasks are same .How can I make different jobs have diferent queue name, and just one worker process all the queues? In programmatically.
I don't know the details of your web app, but this can be pretty straightforward.
(Using Django syntax)
you could make two models/DB tables. One to represent your batch. And one to represent each URL job
Then, when you run your celery tasks, use the
ScrapeJob
model as your referenceSo in your webview, you could simply check if all your batch's jobs are done or not:
So in summary: - A DB table that holds your URLS - A DB table to hold something like a batch number (with foreign key relations to your URL table)