Aggregate group multiple fields

13.5k views Asked by At

Given the following dataset:

{ "_id" : 1, "city" : "Yuma", "cat": "roads", "Q1" : 0, "Q2" : 25, "Q3" : 0, "Q4" : 0 }
{ "_id" : 2, "city" : "Reno", "cat": "roads", "Q1" : 30, "Q2" : 0, "Q3" : 0, "Q4" : 60 }
{ "_id" : 3, "city" : "Yuma", "cat": "parks", "Q1" : 0, "Q2" : 0, "Q3" : 45, "Q4" : 0 }
{ "_id" : 4, "city" : "Reno", "cat": "parks", "Q1" : 35, "Q2" : 0, "Q3" : 0, "Q4" : 0 }
{ "_id" : 5, "city" : "Yuma", "cat": "roads", "Q1" : 0, "Q2" : 15, "Q3" : 0, "Q4" : 20 }

I'm trying to achieve the following result. It would be great to just return the totals greater than zero, and also compress each city, cat and Qx total to a single record.

{
    "city" : "Yuma",
    "cat" : "roads",
    "Q2total" : 40
}, 
{
    "city" : "Reno",
    "cat" : "roads",
    "Q1total" : 30
},
{
    "city" : "Reno",
    "cat" : "roads",
    "Q4total" : 60
},
{
    "city" : "Yuma",
    "cat" : "parks",
    "Q3total" : 45
},
{
    "city" : "Reno",
    "cat" : "parks",
    "Q1total" : 35
},
{
    "city" : "Yuma",
    "cat" : "roads",
    "Q4total" : 20
}

Possible?

1

There are 1 answers

3
AudioBubble On BEST ANSWER

We could ask, to what end? Your documents already have a nice consistent Object structure which is recommended. Having objects with varying keys is not a great idea. Data is "data" and should not really be the name of the keys.

With that in mind, the aggregation framework actually follows this sense and does not allow for the generation of arbitrary key names from data contained in the document. But you could get a similar result with the output as data points:

db.junk.aggregate([
    // Aggregate first to reduce the pipeline documents somewhat
    { "$group": {
        "_id": {
            "city": "$city",
            "cat": "$cat"
        },
        "Q1": { "$sum": "$Q1" },
        "Q2": { "$sum": "$Q2" },
        "Q3": { "$sum": "$Q3" },
        "Q4": { "$sum": "$Q4" }
    }},

    // Convert the "quarter" elements to array entries with the same keys
    { "$project": {
        "totals": {
            "$map": {
                "input": { "$literal": [ "Q1", "Q2", "Q3", "Q4" ] },
                "as": "el",
                "in": { "$cond": [
                    { "$eq": [ "$$el", "Q1" ] },
                    { "quarter": "$$el", "total": "$Q1" },
                    { "$cond": [
                        { "$eq": [ "$$el", "Q2" ] },
                        { "quarter": "$$el", "total": "$Q2" },
                        { "$cond": [
                           { "$eq": [ "$$el", "Q3" ] },
                           { "quarter": "$$el", "total": "$Q3" },
                           { "quarter": "$$el", "total": "$Q4" }
                        ]}
                    ]}
                ]}
            }
        }
    }},

    // Unwind the array produced
    { "$unwind": "$totals" },

    // Filter any "0" resutls
    { "$match": { "totals.total": { "$ne": 0 } } },

    // Maybe project a prettier "flatter" output
    { "$project": {
        "_id": 0,
        "city": "$_id.city",
        "cat": "$_id.cat",
        "quarter": "$totals.quarter",
        "total": "$totals.total"
    }}
])

Which gives you results like this:

{ "city" : "Reno", "cat" : "parks", "quarter" : "Q1", "total" : 35 }
{ "city" : "Yuma", "cat" : "parks", "quarter" : "Q3", "total" : 45 }
{ "city" : "Reno", "cat" : "roads", "quarter" : "Q1", "total" : 30 }
{ "city" : "Reno", "cat" : "roads", "quarter" : "Q4", "total" : 60 }
{ "city" : "Yuma", "cat" : "roads", "quarter" : "Q2", "total" : 40 }
{ "city" : "Yuma", "cat" : "roads", "quarter" : "Q4", "total" : 20 }

You could alternately use mapReduce which allows "some" flexibility with key names. The catch is though that your aggregation is still by "quarter", so you need that as part of the primary key, which cannot be changed once emitted.

Additionally, you cannot "filter" any aggregated results of "0" without a second pass after outputting to a collection, so it's not really of much use for what you want to do, unless you can live with a second mapReduce operation of "transform" query on the output collection.

Worth note is if you look at what is being done in the "second" pipeline stage here with $project and $map you will see that the document structure is essentially being altered to sometime like what you could alternately structure your documents like originally, like this:

{
    "city" : "Reno", 
    "cat" : "parks"
    "totals" : [ 
        { "quarter" : "Q1", "total" : 35 }, 
        { "quarter" : "Q2", "total" : 0 }, 
        { "quarter" : "Q3", "total" : 0 }, 
        { "quarter" : "Q4", "total" : 0 }
    ]
},
{ 
    "city" : "Yuma", 
    "cat" : "parks"
    "totals" : [ 
        { "quarter" : "Q1", "total" : 0 }, 
        { "quarter" : "Q2", "total" : 0 }, 
        { "quarter" : "Q3", "total" : 45 }, 
        { "quarter" : "Q4", "total" : 0 } 
    ]
}

Then the aggregation operation becomes simple for your documents to the same results as shown above:

db.collection.aggregate([
    { "$unwind": "$totals" },
    { "$group": {
        "_id": {
            "city": "$city",
            "cat": "$cat",
            "quarter": "$totals.quarter"
        },
        "ttotal": { "$sum": "$totals.total" }
    }},
    { "$match": { "ttotal": { "$ne": 0 } },
    { "$project": {
        "_id": 0,
        "city": "$_id.city",
        "cat": "$_id.cat",
        "quarter": "$_id.quarter",
        "total": "$ttotal"
    }}
])

So it might make more sense to consider structuring your documents in that way to begin with and avoid any overhead required by the document transformation.

I think you'll find that consistent key names makes a far better object model to program to, where you should be reading the data point from the key-value and not the key-name. If you really need to, then it's a simple matter of reading the data from the object and transforming the keys of each already aggregated result in post processing.