I built a rake task to donwload a zip from Awin datafeed and import it to my product model via activerecord-import.
require 'zip'
require 'httparty'
require 'active_record'
require 'activerecord-import'
namespace :affiliate_datafeed do
desc "Import products data from Awin"
task import_product_awin: :environment do
url = "https://productdata.awin.com"
dir = "db/affiliate_datafeed/awin.zip"
File.open(dir, "wb") do |f|
f.write HTTParty.get(url).body
end
zip_file = Zip::File.open(dir)
entry = zip_file.glob('*.csv').first
csv_text = entry.get_input_stream.read
products = []
CSV.parse(csv_text, :headers=>true).each do |row|
products << Product.new(row.to_h)
end
Product.import(products)
end
end
How to update the product db only if the product doesn't exist or if there is a new date in the last_updated field? What is the best way to refresh a large db?
Probably use some methods like the following to keep checking the last_updated or last_modified header field in your rake task.