20

Is it possible to merge array fields in while using MongoDB aggregation framework? Here is a summary problem I am trying to solve:

Sample input documents for aggregation:

{
  "Category" : 1,
  "Messages" : ["Msg1", "Msg2"],
  "Value" : 1
},
{
  "Category" : 1,
  "Messages" : [],
  "Value" : 10
},
{
  "Category" : 1,
  "Messages" : ["Msg1", "Msg3"],
  "Value" : 100
},
{
  "Category" : 2,
  "Messages" : ["Msg4"],
  "Value" : 1000
},
{
  "Category" : 2,
  "Messages" : ["Msg5"],
  "Value" : 10000
},
{
  "Category" : 3,
  "Messages" : [],
  "Value" : 100000
}

We want to group by 'Category' while summing up 'Value' and merging 'Messages'. I have tried this aggregation pipeline:

{group : {
        _id : "$Category",
        Value : { $sum : "$Value"},
        Messages : {$push : "$Messages"}
    }
}, 
{$unwind : "$Messages"}, 
{$unwind : "$Messages"}, 
{$group : {
        _id : "$_id",
        Value : {$first : "$Value"},
        Messages : {$addToSet : "$Messages"}
    }
}

The result is:

"result" : [{
        "_id" : 1,
        "Value" : 111,
        "Messages" : ["Msg3", "Msg2", "Msg1"]
    }, 
    {
        "_id" : 2,
        "Value" : 11000,
        "Messages" : ["Msg5", "Msg4"]
    }
]

However, this completely misses Category 3 since the documents where 'Category' is 3 do not have any 'Messages' and they are dropped by the second unwind. We would like the result to include the following as well:

{
    "_id" : 3,
    "Value" : 100000,
    "Messages" : []
}

Is there a neat way of achieving this by the aggregation framework?

4
  • is Messages guaranteed to be there as an array? Or is it possible it won't exist or will be there but as a different type? Commented Oct 18, 2013 at 1:11
  • yes Messages is guaranteed to exist as an array (which may be empty for some records). Commented Oct 29, 2013 at 11:13
  • 2
    have you tried the preserveNullAndEmptyArrays option to $unwind? Commented Dec 20, 2016 at 15:40
  • This question was raised when we were using v2.6. I believe using preserveNullAndEmptyArrays should do what we were looking for. Commented Mar 30, 2017 at 13:09

2 Answers 2

19

Here is a trick you can use if Messages is guaranteed to be an array:

> db.messages.find()
    { "Category" : 1, "Messages" : [  "Msg1",  "Msg2" ], "Value" : 1 }
    { "Category" : 1, "Messages" : [ ], "Value" : 10 }
    { "Category" : 1, "Messages" : [  "Msg1",  "Msg3" ], "Value" : 100 }
    { "Category" : 2, "Messages" : [  "Msg4" ], "Value" : 1000 }
    { "Category" : 2, "Messages" : [  "Msg5" ], "Value" : 10000 }
    { "Category" : 3, "Messages" : [ ], "Value" : 100000 }

> var group1 = {
    "$group":   {
        "_id":      "$Category",
        "Value":    {
            "$sum":     "$Value"
        },
        "Messages": {
            "$push":    "$Messages"
        }
    }
};

> var project1 = {
    "$project": {
        "Value":    1,
        "Messages": {
            "$cond":    [
                {
                    "$eq":  [
                        "$Messages",
                        [ [ ] ]
                    ]
                },
                [ [ null ] ],
                "$Messages"
            ]
        }
    }
};

> db.messages.aggregate( group1, project1 )
    { "_id" : 3, "Value" : 100000, "Messages" : [  [  null ] ] }
    { "_id" : 2, "Value" : 11000, "Messages" : [  [  "Msg4" ],  [  "Msg5" ] ] }
    { "_id" : 1, "Value" : 111, "Messages" : [  [  "Msg1",  "Msg2" ],  [ ],  [  "Msg1",  "Msg3" ] ] }

Now unwind twice and re-group to get a single Messages array.

> var unwind = {"$unwind":"$Messages"};

> var group2 = {
    $group: {
        "_id":      "$_id", 
        "Value":    {
            "$first":       "$Value"
        }, 
        "Messages": {
            "$addToSet":    "$Messages"
        }
    }
};

> var project2 = {
    "$project": {
        "Category": "$_id",
        "_id":      0,
        "Value":    1,
        "Messages": {
            "$cond":    [
                {
                    "$eq":  [
                        "$Messages",
                        [ null ]
                    ]
                },
                [ ],
                "$Messages"
            ]
        }
    }
};

> db.messages.aggregate(group1, project1, unwind, unwind, group2 ,project2 )
    { "Value" : 111, "Messages" : [  "Msg3",  "Msg2",  "Msg1" ], "Category" : 1 }
    { "Value" : 11000, "Messages" : [  "Msg5",  "Msg4" ], "Category" : 2 }
    { "Value" : 100000, "Messages" : [ ], "Category" : 3 }
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks for the tips. It almost does what I need. However, there is a case where it does not produce the desired result. Aggregated result for Category 1 (based on the documents in my original post) ends up having 4 messages: ["Msg1", "Msg2", "Msg3", "dummy"]. I am not sure how to easily get rid of "dummy" for this case.
Right - there is a way to get rid of it - I'll update the answer
ok, the complete answer now with all the steps - should be exactly what you want :)
@AsyaKamsky thanks this is great help could you help me for one more use case in it as I have two array field in my document as you can say messages and tags. And I need same behaviour for two fields
post it as a question with full details - comments aren't really for discussing new problems.
4

As already mentioned in one of the comments, the simplest answer to the original question is to add preserveNullAndEmptyArrays to the $unwind stage.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.