Importing data that may take 10-15 minutes to process, what are my options in Rails?

94 views Asked by At

I have a Rails application that displays thousands of products. The products are loaded from product feeds, so the source may be a large XML file or web service API calls.

I want to be able to re-use my models in my existing rails application in my import process.

What are my options in importing data into my Rails application?

  1. I could use sidekiq to fire off rake tasks, but not sure if sidekiq is suitable for tasks that take 10+ minutes to run? Most use cases that I have seen is for sending of emails and other similiar light tasks

  2. I could create maybe a stand-alone ruby script, but not sure how I could re-use my Rails models if I go this route.

Update My total product could is around 30-50K items.

1

There are 1 answers

1
Philip Hallstrom On

Sidekiq would be a great option for this as others have mentioned. 10+ minutes isn't unreasonable as long as you understand that if you restart your sidekiq process mid run that job will be stopped as well.

The concern I have is if you are importing 50K items and you have a failure near the beginning you'll never get to the last ones. I would suggest looking at your import routine and seeing if you can break it up into smaller components. Something like this:

  • Start sidekiq import job.
  • First thing job does is reschedule itself N hours later.
  • Fetch data from API/XML.
  • For each record in that result schedule a "import this specific data" job with the data as an argument.
  • Done.

The key is the second to last step. By doing it this way your primary job has a much better chance of succeeding as all it is doing is reading API/XML and scheduling 50K more jobs. Each of those can run individually and if a single one fails it won't affect the others.

The other thing to remember is that unless you configure it not to Sidekiq will rerun failed jobs. So make sure that "import specific data" job can be run multiple times and still do the right thing.

I have a very similar setup that has worked well for me for two years.