MongoDB $lookup not using index

15.8k views Asked by At

I'm writing a query that requires a $lookup between two tables and as I understand it, it's essential that the foreignField have an index in order to perform this join in a timely fashion. However, even after adding an index on the field, the query is still falling back to COLLSCAN.

db.users.aggregate([
  {$lookup:{ from: "transactions", localField: '_id', foreignField: 'uid', as: 'transaction' }},
  { $match: { transaction: { "$size" : 0} } },
  { $count: "total"},
], { explain: true })

This returns:

"queryPlanner" : {
    "plannerVersion" : 1,
    "namespace" : "test.users",
    "indexFilterSet" : false,
    "parsedQuery" : {

    },
    "winningPlan" : {
        "stage" : "COLLSCAN",
        "direction" : "forward"
    },
    "rejectedPlans" : [ ]
}

As I mentioned, I do have the uid field indexed in the transactions collection:

> db.transactions.getIndexes()
[
    {
        "v" : 1,
        "key" : {
            "_id" : 1
        },
        "name" : "_id_",
        "ns" : "test.transactions"
    },
    {
        "v" : 1,
        "key" : {
            "uid" : 1
        },
        "name" : "uid_1",
        "ns" : "test.transactions"
    }
]

The query takes a few minutes to run in a DB of approximately 7M documents. I'm using MongoDB v3.4.7. Any idea as to what I could be doing wrong? Thanks in advance!

2

There are 2 answers

0
Joe On

The "stage" : "COLLSCAN", isn't referring to the $lookup at all.

The first step in that aggregation pipeline is to fetch all of the documents from the 'users' collection. Since there are no filters at all provided for that, collection scan is the most efficient method.

The $lookup stage should be planned like any other query, and would likely use the index.

0
Ashutosh On

Because your aggregation pipeline first stage has no $match or $sort or $geoNear query on indexed key and In $match stage you didn't query on any index key.

Case 1: If you do $match on indexed key in first stage, WinningPlan stage will be "FETCH" and stage of inputStage will be "IXSCAN"

"winningPlan" : {
    "stage" : "FETCH",
    "inputStage" : {
            "stage" : "IXSCAN",
        ...
    }
}

Case 2: If you do $match on non-indexed key in first stage, WinningPlan stage will be "COLLSCAN"

"winningPlan" : {
    "stage" : "COLLSCAN"
}

Case 3: If You do $match on index key after lookup (according to your query), WinningPlan stage will be "FETCH" and inputStage will be "IXSCAN".

Case 4: If You do $match on non-index key after lookup (just you did), WinningPlan stage will be "COLLSCAN".

For 7M record, you must use index in your queries. Don't do too much indexing, because they will stored in the RAM and you can't use $ne or $nin on indexed key properly.

Mongodb Docs: Optimizing Aggregation Pipeline

Mongodb Docs: Indexing Strategies