MongoDB: Which is the best strategy, simple or compound indexes?

1k views Asked by At

I have a question about how index optimization works in Mongo.

My application is separated by clients (client_id field) and they can filter their users by different fields of their choose freely. That's why I have created indexes on different combinations to ensure the use of indexes.

If you search for "client_id + category + address" is an index composed of 3 fields equally optimized as 3 simple indexes (one for each field)? This is my biggest doubt. If there are 6 filterable fields, with different combinations. How do you recommend creating the indexes?

Thanks!

1

There are 1 answers

2
user20042973 On BEST ANSWER

Using multiple indexes to execute a query is known as Index Intersection. As @Sergio Tulentsev points out in the comments, it is almost always the case that using a single compound index (that aligns closely with a query shape) will be faster than relying on index intersection.

The reason for this is because index intersection requires the database to do scan more index keys than a plan that uses a compound index. At a minimum, index intersection requires a larger number of total index keys will be scanned and the database must also merge these results together.

Let's explore this further. Consider that we have the following 190 documents in a collection, 100 of which match the query predicates individually but only 10 match both:

test> db.foo.count()
190
test> db.foo.count({x:123})
100
test> db.foo.count({y:456})
100
test> db.foo.count({x:123,y:456})
10

Assuming single field indexes of { x: 1 } and { y : 1 } exists, then the associated index intersection plan is going to scan both based on their associated predicates:

  {
    stage: 'IXSCAN',
    ...
    indexBounds: { x: [ '[123, 123]' ] }
  },
  {
    stage: 'IXSCAN',
    ...
    indexBounds: { y: [ '[456, 456]' ] }
  }

In order to ensure complete and correct results, the database will need to complete the scan of one of those indexes. Depending on how the data was inserted, this will require between 110 and 200 total index keys to be examined:

    nReturned: 10,
    totalKeysExamined: 110,
    totalDocsExamined: 10,
    nReturned: 10,
    totalKeysExamined: 200,
    totalDocsExamined: 10,

Note in this particular example that 200 is more than the total number of documents in the collection.

By contrast, the database is able to precisely examine only the 10 relevant index keys when using the appropriate compound index instead:

    nReturned: 10,
    totalKeysExamined: 10,
    totalDocsExamined: 10,
    ...
        stage: 'IXSCAN',
        ...
        indexBounds: { x: [ '[123, 123]' ], y: [ '[456, 456]' ] },

Also keep in mind the warnings that MongoDB currently has on its page about index intersection which reads, in part:

This page documents cases where the query optimizer may be able to use index intersection.

In practice, the query optimizer rarely selects plans that use index intersection.

Hash-based index intersection is disabled by default and sort-based index intersection is disfavored in plan selection.

According to that page, it is best to rely on Compound Indexes for now unless you are using Atlas Search which the page suggests has different behavior.

The fact that such plans require scanning more index keys likely contributes to them not being selected very frequently in practice today.

Finally, the ordering of index keys is very important. That is a separate topic, but some material on the topic can be found in the following links:


Edit

Courtesy of @rickhg12hs, here is a playground link. Interestingly, it kind of proves the point above that we shouldn't currently rely on index intersection with core MongoDB indexes. The output in that playground shows that there weren't even any index intersection plans generated for consideration by the query planner in that particular configuration.

It's impossible to tell why that might be the case. But there are other oddities as well, such as the fact that some layer transposed the order of the keys for the index bounds (probably making them alphabetical?):

            "indexBounds": {
              "x": [
                "[20.0, 20.0]"
              ],
              "y": [
                "[32.0, 32.0]"
              ]
            },
            "indexName": "y_1_x_1",