Golang MongoDB (mgo) aggregation with nested arrays

2.4k views Asked by At

I have MongoDB data of the following form:

{"_id":"53eb9a5673a57578a10074ec","data":{"statistics":{"gsm":[{"type":"Attacks","value":{"team1":66,"team2":67}},{"type":"Corners","value":{"team1":8,"team2":5}},{"type":"Dangerous attacks","value":{"team1":46,"team2":49}},{"type":"Fouls","value":{"team1":9,"team2":14}},{"type":"Free kicks","value":{"team1":18,"team2":10}},{"type":"Goals","value":{"team1":2,"team2":1}},{"type":"Goal kicks","value":{"team1":10,"team2":11}},{"type":"Offsides","value":{"team1":1,"team2":4}},{"type":"Posession","value":{"team1":55,"team2":45}},{"type":"Shots blocked","value":{"team1":4,"team2":1}},{"type":"Shots off target","value":{"team1":7,"team2":5}}]}}}

I want to get the average of data.statistics.gsm.value.team1 when data.statistics.gsm.type == "Attacks" using the Golang MongoDB driver mgo. Code I have tried so far (with either one or both the group statements below):

pipeline := []bson.M{
    bson.M{"$match": bson.M{"kick_off.utc.gsm.date_time": bson.M{"$gt": start, "$lt": end}}}, 
bson.M{
        "$group": bson.M{
            "_id":     "$gsm_id",
    "event_array" : bson.M{"$first": "$data.statistics.gsm"}}},
bson.M{
            "$group": bson.M{
                "_id":     "$type",
          "avg_attack" : bson.M{"$avg": "$data.statistics.gsm.value.team1"}}}}

With only the first group statement, I get back the below, but the second group statement doesn't help me get the average.

[{"_id":1953009,"event_array":[{"type":"Attacks","value":{"team1":48,"team2":12}},{"type":"Corners","value":{"team1":12,"team2":0}},{"type":"Dangerous attacks","value":{"team1":46,"team2":7}},{"type":"Fouls","value":{"team1":10,"team2":3}},{"type":"Free kicks","value":{"team1":5,"team2":12}},{"type":"Goals","value":{"team1":8,"team2":0}}
1

There are 1 answers

0
Verran On BEST ANSWER

I always find it helpful to get a pretty print view of the json. Here is what you say you get from the first group statement:

[  
{  
"_id":1953009,
"event_array":[  
  {  
    "type":"Attacks",
    "value":{  
      "team1":48,
      "team2":12
    }
  },
  {  
    "type":"Corners",
    "value":{  
      "team1":12,
      "team2":0
    }
  },
...

Now the second group statement you use:

"$group": bson.M{
     "_id":     "$type",
     "avg_attack" : bson.M{"$avg": "$data.statistics.gsm.value.team1"}
}

You're trying to take the average of data.statistics.gsm.value.team1 on the results of the first group statement, but that doesn't exist in the results of the first group statement so of course it won't give you an average.

Instead of the approach you're using, I'd suggest looking into the $unwind operator to break down the array into a set of documents, then you should be able group them in the way you're trying to here with {$avg: "$value.team1"}.

So the overall pipeline that is used to produce the aggregation would be: $match -> $group1 -> $unwind -> $group2. Just keep in mind that each phase of the pipeline is operating on the data produced by the previous stage, which is why your data.statistics.gsm.value.team1 part was incorrect.