This question is updated with a simpler explanation.
For my data the following Mongo CLI query takes 208ms. This query retrieves ALL the data for the 18 requested objects.
db.videos.find({avg_rating: {$gt: 1}, poster_large_thumb: {$exists:true}, release_date: {$lte: ISODate("2000-12-31")}}).sort({release_date:-1, avg_rating:-1, title:1}).skip(30).limit(18).pretty().explain()
{
"cursor" : "BasicCursor",
"nscanned" : 76112,
"nscannedObjects" : 76112,
"n" : 48,
"scanAndOrder" : true,
"millis" : 208,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
}
}
However with Mongoid when I do the query it creates a criteria object that has no actual data. I pass this criteria to a method that iterates it to create a JSON data structure. Since the data is needed each object must be retrieved from the DB. Here is the criteria returning query:
@videos = Video.order_by(release_date: -1, avg_rating: -1, title: 1).where(:avg_rating.gt => 1, :poster_large_thumb.exists => 1, :release_date.lte => start_date).skip(skip*POSTERS_PER_ROW).limit(limit*POSTERS_PER_ROW)
When I iterate @videos, each object takes over 240ms to retrieve from the DB, this from some debug output.
Getting one object:
2013-12-18 00:43:52 UTC
2013-12-18 00:43:52 UTC
0.24489331245422363
Assuming that if I got all the data in the Video.order_by(...) query it would take 208ms total how can I force it to do retrieval in one query instead of getting each objects individually?
Something here is causing the entire retrieval to take a couple of orders of magnitude more than the Mongo CLI.
Thanks for the ideas, I think @Arthur gave the important hint. No one could have answered because it looks like the problem was in another bit of code--in how I access the criteria.
Given the following query, which produces a criteria:
In a couple nested block I'm grabbing values with a line of code like this:
This random access notation seems to be the problem. It seems to execute the full Moped query for every individual object so POSTERS_PER_ROW*num_rows times.
If I grab all the videos before the loops with this bit of code:
Then grab the values from an array instead of the criteria like this:
I get only one Moped query of 248ms, all objects are retrieved with that one query. This is a huge speed up. the query time goes from num_rows* POSTERS_PER_ROW*248ms to just 248ms
This is a big lesson learned for me so if someone can point to the docs that describe this effect and the rules to follow I'd sure appreciate it.