I have created 4 indexes to test query performance in my collection when quering for two fields of the same document, one of which is an array (needs a multi-key index). Two of the indexes are single and two compound.
I am surpised because of getting better performance with one of the single indexes than with the compound ones. I was expecting to obtain the best performace with a compound index, because I understand that it indexes the two fields allowing for faster querying.
These are my indexes:
{ "v" : 1,
"key" : { "_id" : 1 },
"ns" : "bt_twitter.mallorca.mallorca",
"name" : "_id_"
},
{ "v" : 1,
"key" : { "epoch_creation_date" :1 },
"ns" : "bt_twitter.mallorca.mallorca",
"name" : "epoch_creation_date_1"
},
{ "v" : 1,
"key" : { "related_hashtags" : 1 },
"ns" : "bt_twitter.mallorca.mallorca",
"name" : "related_hashtags_1"
},
{ "v" : 1,
"key" : { "epoch_creation_date" : 1, "related_hashtags" : 1 },
"ns" : "bt_twitter.mallorca.mallorca",
"name" : "epoch_creation_date_1_related_hashtags_1"
}
My queries and performance indicators are (hint parameter shows the index used at each query):
QUERY 1:
active_collection.find(
{'epoch_creation_date': {'$exists': True}},
{"_id": 0, "related_hashtags":1}
).hint([("epoch_creation_date", ASCENDING)]).explain()
millis: 237
nscanned: 101226
QUERY 2:
active_collection.find(
{'epoch_creation_date': {'$exists': True}},
{"_id": 0, "related_hashtags": 1}
).hint([("related_hashtags", ASCENDING)]).explain()
millis: 1131
nscanned: 306715
QUERY 3:
active_collection.find(
{'epoch_creation_date': {'$exists': True}},
{"_id": 0, "related_hashtags": 1}
).hint([("epoch_creation_date", ASCENDING), ("related_hashtags", ASCENDING)]).explain()
millis: 935
nscanned: 306715
QUERY 4:
active_collection.find(
{'epoch_creation_date': {'$exists': True}},
{"_id": 0, "related_hashtags": 1}
).hint([("related_hashtags", ASCENDING),("epoch_creation_date", ASCENDING)]).explain()
millis: 1165
nscanned: 306715
QUERY 1 scans less documents, what is probably the reason to be faster. Can somebody help me to understand why is it performing better than queries with compound indexes? Therefore, when is better to use a compound index than a single one?
I am reading mongo documentation but these concepts are resulting hard for me to digest.
Thanks in advance.
UPDATED question (in response to Sammaye and Philipp)
This is the result of a full explain()
"cursor" : "BtreeCursor epoch_creation_date_1",
"isMultiKey" : false,
"n" : 101226,
"nscannedObjects" : 101226,
"nscanned" : 101226,
"nscannedObjectsAllPlans" : 101226,
"nscannedAllPlans" : 101226,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 242,
"indexBounds" : {u'epoch_creation_date': [[{u'$minElement': 1}, {u'$maxElement': 1}]]
},
"server" : "vmmongodb:27017"
for the following query:
active_collection.find(
{'epoch_creation_date': {'$exists': True}},
{"_id": 0, "related_hashtags":1})
.hint([("epoch_creation_date", ASCENDING)]).explain()
You created a compound index (named
epoch_creation_date_1_related_hashtags_1
), but you aren't using it in those hints. Instead of that you are using the two single-field indexes you also created (related_hashtags_1
andepoch_creation_date_1
) in different order.Of those two indexes, only
epoch_creation_date_1
is effective, because you aren't querying for both fields. You are only querying for one, and this is'epoch_creation_date': {'$exists': True}
. The field-filtering which you perform with{"_id": 0, "related_hashtags":1}
is done on the documents which were found by that query. At that point, indexes are of no use anymore. That means any index onrelated_hashtags
won't be able to increase performance on this query. The compound index (when you would actually use it) might be better than no index at all, but not as good as the index onepoch_creation_date
only.