Efficient retrieval of large data sets in Mongo mapper?

1k views Asked by At

I am storing a large amount of Twitter data, and would like to retrieve about 500k records for data processing at a time. I have a TwitterTweet mongo document that contains basic tweet data, and try to retrieve it as follows:

weekly_tweets = TwitterTweet.all(:created_at.gt => 1.week.ago, :fields => [:created_at, :text, :from_user])

Trouble is, this take up a LOT of time and memory - is there any way to make this more scalable and efficient. I have thought of using map reduce, but it looks very complicated for what I want to do - text processing and regexp stuff on the tweets.

1

There are 1 answers

2
Michael Papile On

Do not call all as this has the effect of making an object of all 500k of your entries in mongo and will as you noticed use a ton of memory and time. Use find_each instead and iterate through. Find returns a cursor which is way more efficient.