Aggregate group multiple fields

Question

Aggregate group multiple fields

13.5k views Asked by wolstat At 12 June 2015 at 02:28

Given the following dataset:

{ "_id" : 1, "city" : "Yuma", "cat": "roads", "Q1" : 0, "Q2" : 25, "Q3" : 0, "Q4" : 0 }
{ "_id" : 2, "city" : "Reno", "cat": "roads", "Q1" : 30, "Q2" : 0, "Q3" : 0, "Q4" : 60 }
{ "_id" : 3, "city" : "Yuma", "cat": "parks", "Q1" : 0, "Q2" : 0, "Q3" : 45, "Q4" : 0 }
{ "_id" : 4, "city" : "Reno", "cat": "parks", "Q1" : 35, "Q2" : 0, "Q3" : 0, "Q4" : 0 }
{ "_id" : 5, "city" : "Yuma", "cat": "roads", "Q1" : 0, "Q2" : 15, "Q3" : 0, "Q4" : 20 }

I'm trying to achieve the following result. It would be great to just return the totals greater than zero, and also compress each city, cat and Qx total to a single record.

{
    "city" : "Yuma",
    "cat" : "roads",
    "Q2total" : 40
}, 
{
    "city" : "Reno",
    "cat" : "roads",
    "Q1total" : 30
},
{
    "city" : "Reno",
    "cat" : "roads",
    "Q4total" : 60
},
{
    "city" : "Yuma",
    "cat" : "parks",
    "Q3total" : 45
},
{
    "city" : "Reno",
    "cat" : "parks",
    "Q1total" : 35
},
{
    "city" : "Yuma",
    "cat" : "roads",
    "Q4total" : 20
}

Possible?

Original Q&A

There are 1 answers

**AudioBubble** · Accepted Answer · 2015-06-12T05:07:04+00:00

We could ask, to what end? Your documents already have a nice consistent Object structure which is recommended. Having objects with varying keys is not a great idea. Data is "data" and should not really be the name of the keys.

With that in mind, the aggregation framework actually follows this sense and does not allow for the generation of arbitrary key names from data contained in the document. But you could get a similar result with the output as data points:

db.junk.aggregate([
    // Aggregate first to reduce the pipeline documents somewhat
    { "$group": {
        "_id": {
            "city": "$city",
            "cat": "$cat"
        },
        "Q1": { "$sum": "$Q1" },
        "Q2": { "$sum": "$Q2" },
        "Q3": { "$sum": "$Q3" },
        "Q4": { "$sum": "$Q4" }
    }},

    // Convert the "quarter" elements to array entries with the same keys
    { "$project": {
        "totals": {
            "$map": {
                "input": { "$literal": [ "Q1", "Q2", "Q3", "Q4" ] },
                "as": "el",
                "in": { "$cond": [
                    { "$eq": [ "$$el", "Q1" ] },
                    { "quarter": "$$el", "total": "$Q1" },
                    { "$cond": [
                        { "$eq": [ "$$el", "Q2" ] },
                        { "quarter": "$$el", "total": "$Q2" },
                        { "$cond": [
                           { "$eq": [ "$$el", "Q3" ] },
                           { "quarter": "$$el", "total": "$Q3" },
                           { "quarter": "$$el", "total": "$Q4" }
                        ]}
                    ]}
                ]}
            }
        }
    }},

    // Unwind the array produced
    { "$unwind": "$totals" },

    // Filter any "0" resutls
    { "$match": { "totals.total": { "$ne": 0 } } },

    // Maybe project a prettier "flatter" output
    { "$project": {
        "_id": 0,
        "city": "$_id.city",
        "cat": "$_id.cat",
        "quarter": "$totals.quarter",
        "total": "$totals.total"
    }}
])

Which gives you results like this:

{ "city" : "Reno", "cat" : "parks", "quarter" : "Q1", "total" : 35 }
{ "city" : "Yuma", "cat" : "parks", "quarter" : "Q3", "total" : 45 }
{ "city" : "Reno", "cat" : "roads", "quarter" : "Q1", "total" : 30 }
{ "city" : "Reno", "cat" : "roads", "quarter" : "Q4", "total" : 60 }
{ "city" : "Yuma", "cat" : "roads", "quarter" : "Q2", "total" : 40 }
{ "city" : "Yuma", "cat" : "roads", "quarter" : "Q4", "total" : 20 }

You could alternately use mapReduce which allows "some" flexibility with key names. The catch is though that your aggregation is still by "quarter", so you need that as part of the primary key, which cannot be changed once emitted.

Additionally, you cannot "filter" any aggregated results of "0" without a second pass after outputting to a collection, so it's not really of much use for what you want to do, unless you can live with a second mapReduce operation of "transform" query on the output collection.

Worth note is if you look at what is being done in the "second" pipeline stage here with $project and $map you will see that the document structure is essentially being altered to sometime like what you could alternately structure your documents like originally, like this:

{
    "city" : "Reno", 
    "cat" : "parks"
    "totals" : [ 
        { "quarter" : "Q1", "total" : 35 }, 
        { "quarter" : "Q2", "total" : 0 }, 
        { "quarter" : "Q3", "total" : 0 }, 
        { "quarter" : "Q4", "total" : 0 }
    ]
},
{ 
    "city" : "Yuma", 
    "cat" : "parks"
    "totals" : [ 
        { "quarter" : "Q1", "total" : 0 }, 
        { "quarter" : "Q2", "total" : 0 }, 
        { "quarter" : "Q3", "total" : 45 }, 
        { "quarter" : "Q4", "total" : 0 } 
    ]
}

Then the aggregation operation becomes simple for your documents to the same results as shown above:

db.collection.aggregate([
    { "$unwind": "$totals" },
    { "$group": {
        "_id": {
            "city": "$city",
            "cat": "$cat",
            "quarter": "$totals.quarter"
        },
        "ttotal": { "$sum": "$totals.total" }
    }},
    { "$match": { "ttotal": { "$ne": 0 } },
    { "$project": {
        "_id": 0,
        "city": "$_id.city",
        "cat": "$_id.cat",
        "quarter": "$_id.quarter",
        "total": "$ttotal"
    }}
])

So it might make more sense to consider structuring your documents in that way to begin with and avoid any overhead required by the document transformation.

I think you'll find that consistent key names makes a far better object model to program to, where you should be reading the data point from the key-value and not the key-name. If you really need to, then it's a simple matter of reading the data from the object and transforming the keys of each already aggregated result in post processing.

TechQA.

Aggregate group multiple fields

There are 1 answers

Related Questions in MONGODB

Related Questions in AGGREGATION-FRAMEWORK

Popular Questions

Popular Tags

Trending Questions