50

I am trying to write an aggregation to identify accounts that use multiple payment sources. Typical data would be.

{
 account:"abc",
 vendor:"amazon",
}
 ...
{
 account:"abc",
 vendor:"overstock",
}

Now, I'd like to produce a list of accounts similar to this

{
 account:"abc",
 vendorCount:2
}

How would I write this in Mongo's aggregation framework

8 Answers 8

82

I figured this out by using the $addToSet and $unwind operators.

Mongodb Aggregation count array/set size

db.collection.aggregate([
{
    $group: { _id: { account: '$account' }, vendors: { $addToSet: '$vendor'} }
},
{
    $unwind:"$vendors"
},
{
    $group: { _id: "$_id", vendorCount: { $sum:1} }
}
]);

Hope it helps someone

Sign up to request clarification or add additional context in comments.

7 Comments

This may work for sets where cordiality of the set is small enough, but for big data scenarios this won't work (imagine if you had hundreds of thousands of unique vendors).
This answer solves the big data scenario: stackoverflow.com/a/24770233/139721
Is it really necessary to go back to iterate $vendors again? given that fact that we can compute the count of vendors by results.get("vendors").size();
@JerryChin can be used operator $size in pipline stackoverflow.com/questions/18501064/…
This may throw OperationFailure: BufBuilder attempted to grow() exception! any idea to solve this problem. Thanks
|
27

I think its better if you execute query like following which will avoid unwind

db.t2.insert({_id:1,account:"abc",vendor:"amazon"});
db.t2.insert({_id:2,account:"abc",vendor:"overstock"});


db.t2.aggregate([
{ $group : { _id : { "account" : "$account", "vendor" : "$vendor" }, number : { $sum : 1 } } },
{ $group : { _id : "$_id.account", number : { $sum : 1 } } }
]);

Which will show you following result which is expected.

{ "_id" : "abc", "number" : 2 }

1 Comment

this assumes every account has at least one vendor
25

You can use sets

db.test.aggregate([
    {$group: { 
      _id: "$account", 
      uniqueVendors: {$addToSet: "$vendor"}
    }},
    {$project: {
      _id: 1, 
      vendorsCount: {$size: "$uniqueVendors"}
    }}
]);

1 Comment

Note that this will only work as long as all the vendors fit in a document, which is limited to 16MB. Probably fine for most cases, but if there are millions of vendors and/or the vendor ids are long (GUID strings anyone? :-/ ) then I guess the double group is the way to go.
10

I do not see why somebody would have to use $group twice.

db.collection.aggregate([ { $group: { "_id": "$account", "number": { $sum: 1 } } } ])

5 Comments

I supposed because they wanted to rename the keys and reformat. But this is,indeed, much better and much effective.
Is this distinct count?
it is 'select group_id, count(*) from table_name group by group_id', rather than 'select count(distinct group_id)) from table_name'
this answer is wrong as it assumes that no account will have the same vendor twice (i.e. it assumes that the number of documents for each account is the same as the number of distinct vendors. Completely wrong.
this answer yields the number of all documents with the same account. for example: account:"abc", account:"abc", account:"abc", account:"bbb" -> abc: 3, bbb: 1
5

This approach doesn't make use of $unwind and other extra operations. Plus, this won't affect anything if new things are added into the aggregation. There's a flaw in the accepted answer. If you have other accumulated fields in the $group, it would cause issues in the $unwind stage of the accepted answer.

db.collection.aggregate([{
    "$group": {
        "_id": "$account",
        "vendors": {"$addToSet": "$vendor"}
    }
},
{
    "$addFields": {
        "vendorCount": {
            "$size": "$vendors"
        }
    }
}])

1 Comment

this answer is identical to @Hett's answer which was added 18 months earlier.
-1

To identify accounts that use multiple payment sources:

  1. Use grouping to count data from multiple account records and group the result by account with count
  2. Use a match case is to filter only such accounts having more than one payment method
  db.payment_collection.aggregate([ { $group: {"_id":"$account" ,
 "number":{$sum:1}} }, {
                     "$match": {
                         "number": { "$gt": 1 }
                      }
                 } ])

This will work perfectly fine,

Comments

-3
db.UserModule.aggregate(
{ $group : { _id : { "companyauthemail" : "$companyauthemail", "email" : "$email" }, number : { $sum : 1 } } },
{ $group : { _id : "$_id.companyauthemail", number : { $sum : 1 } } }
);

2 Comments

While this code snippet may be the solution, including an explanation really helps to improve the quality of your post. Remember that you are answering the question for readers in the future, and those people might not know the reasons for your code suggestion.
plus it's fundamentally the same as existing answers.
-5

An example

db.collection.distinct("example.item").forEach( function(docs) {
    print(docs + "==>>" + db.collection.count({"example.item":docs}))
});

1 Comment

You should provide a description to describe why this works as a solution for the question. It's also very, very helpful to make the example code use the same data and variable context as the actual question. This answer would be considered "low quality" on StackOverflow; low quality answers tend to attract downvotes, and may get you banned from answering any more questions.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.