Mongo query latest version of entries

34 views Asked by At

I try to get every latest entry of a "group" out of Mongo.

You can image the data like a set of wiki entries:

  • Each entry can be edited (and old versions are kept).
  • We are interested in the latest version of the entries.
  • The entries (and their ancestors) are sorted by a certain logic.

To get more into detail:

  • Each entry is identified by an itemId
  • Each itemId can occur 1 to n times in the dataset
  • The entry versions are sorted by orderHash

Here you can see a small batch of data:

{
    "_id": ObjectId("5f69b963b8705200282d6174"),
    "itemId": "xszy",
    "orderHash": "383u",
    "title": "A",
    "content": "ABC"
},
{
    "_id": ObjectId("5f69b963b8705200282d6175"),
    "itemId": "e92q",
    "orderHash": "5j12",
    "title": "K",
    "content": "KLMN"
},
{
    "_id": ObjectId("5f69b963b8705200282d6178"),
    "itemId": "xszy",
    "orderHash": "p578",
    "title": "A",
    "content": "ABD"
},
{
    "_id": ObjectId("5f69b963b8705200282d6180"),
    "itemId": "mtewy",
    "orderHash": "383u",
    "title": "L",
    "content": "CASE"
},
{
    "_id": ObjectId("5f69b963b8705200282d6189"),
    "itemId": "mtewy",
    "orderHash": "5j12",
    "title": "L1",
    "content": "CASE"
}

I use the following aggregate pipeline to get my desired result:

db.getCollection('wiki').aggregate([
    {
        // create a sortable field for mongo
        $set: { 
            "sortField": {
                "$indexOfArray": [
                    [ "p578", "5j12", "383u" ], "$orderHash"
                ]
            }
        }
    },
    {
        // sort by created sort field
        "$sort": {
            "sortField": 1 
        }
    },
    {
        // group items by itemId and "save" them sorted in an array
        $group: {
            _id: "$itemId",
            data: {
                $push: "$$ROOT"
            }
        }
    },
    {
        // get first entry of each "group array" to obtain
        // the latest version of each entry
        $project: { 
            resp: { $arrayElemAt: ['$data', 0] }
        } 
    }
])

So far this is working. But it feels wrong to store the data in an extra array and I'm not sure about the performance (especially for larger data sets with a lot of edits). Another flaw is that I don't simply get an array of documents back but the data is nested in a documents property resp (which is not good to use for e.g. mongoose).

Is there a more mongo-nic solution for my problem?

SOLUTION:

Thanks to Joe (see comments). He pointed me to the solution.

db.getCollection('wiki').aggregate([
    {
        $set: { 
            "sortField": {
                "$indexOfArray": [
                    [ "p578", "5j12", "383u" ], "$orderHash"
                ]
            }
        }
    },
    {
        "$sort": {
            "sortField": 1 
        }
    },
    {
        $group: {
            _id: "$itemId",
            resp: {
                $first: "$$ROOT"
            }
        }
    },
    { 
        $replaceRoot: { 
            newRoot: "$resp"
        }
    }
])
0

There are 0 answers