I'm dealing with millions of rows of data that I want to load into my Rails application as Models. I'm using MySQL as a database, and I'm on Rails 2.3.14.
One of my co-workers says that it's inadvisable to add records directly to MySQL, bypassing the Rails ActiveRecord system. He's short on specifics, but the gist of it is that Rails does a lot of "magic" when you use it's ActiveRecord system, and it will confuse Rails if you enter data outside of this system. Can someone elaborate on whether this is accurate?
If I should be loading data into Rails through ActiveRecord, I've read that the activerecord-import plugin is the way to go for this type of job.
Any feedback on the best approach for loading in massive amounts of data into Rails would be welcomed.
I can think of six main items to consider, the last five relate to rails 'magic':
Speed. This is huge. Active Record, one-at-a-time inserts can take a second for each row. So that's a million seconds for a million rows - that's 11.5 DAYS which would give it a bad rap by many folks!
Validation. You'll need to make sure that the database enforces the same validations that you have in your models / existing data.
Timestamps. You need to update timestamps manually if you want to update created_at / updated_at the same way rails would
Counter Caches. You'll need to update counts manually.
ActiveRecord gems For example if you use acts_as_audited which lets you keep a record trail for data changes to Model records, you won't have that functionaity if you're outside ActiveRecord.
Business Logic at the Model Layer. Good programmers try to put functionality at the model (or higher) level when they can. This might include items like updating other data, sending emails, writing to logs, etc. This would not happen if ActiveRecord was not invoked.