12

Here's my problem:

Model:

{ application: "abc", date: Time.now, status: "1" user_id: [ id1, id2, id4] }

{ application: "abc", date: Time.yesterday, status: "1", user_id: [ id1, id3, id5] }

{ application: "abc", date: Time.yesterday-1, status: "1", user_id: [ id1, id3, id5] }

I need to count the unique number of user_ids in a period of time.

Expected result:

{ application: "abc", status: "1", unique_id_count: 5 }

I'm currently using the aggregation framework and counting the ids outside mongodb.

{ $match: { application: "abc" } }, { $unwind: "$users" }, { $group: { _id: { status: "$status"}, users: { $addToSet: "$users" } } }

My arrays of users ids are very large, so I have to iterate the dates or I'll get the maximum document limit (16mb).

I could also $group by

{ year: { $year: "$date" }, month: { $month: "$date" }, day: { $dayOfMonth: "$date" }

but I also get the document size limitation.

Is it possible to count the set size in mongodb?

thanks

4
  • Do you have more than 16mb ids per user or do you have more than 16mbs of data for all of the records? If the latter condition exits, you can try flushing the result to an ouput collection. Commented Jan 28, 2013 at 18:13
  • The users array/set size is bigger than a thousand, and user ids are similar to object_ids (50b9d949816e6e37060005c2). The previous version was using map/reduce and an output collection. It was terribly slower. Counting in memory is faster than writing an output collection. Commented Jan 28, 2013 at 18:17
  • And how is the performance when you make a table scan and retrieve only application and userId fields. Of course counting in memory is faster but you have limitations with mongo, as far as I know if the output doesn't fit into memory flushing to disk or making a table scan is your only option. Commented Jan 28, 2013 at 18:30
  • 1
    The performance is acceptable. I just wish for a way to count an array size and not return the entire content. Commented Jan 28, 2013 at 18:40

3 Answers 3

25

The following will return number of uniqueUsers per application. This will apply an group operation to a result of a group operation by using pipeline feature of mongodb.

{ $match: { application: "abc" } }, 
{ $unwind: "$users" }, 
{ $group: { _id: "$status", users: { $addToSet: "$users" } } }, 
{ $unwind:"$users" }, 
{ $group : {_id : "$_id", count : {$sum : 1} } }

Hopefully this will be done in an easier way in the following releases of mongo by a command which gives the size of an array under a projection. {$project: {id: "$_id", count: {$size: "$uniqueUsers"}}} https://jira.mongodb.org/browse/SERVER-4899

Cheers

Sign up to request clarification or add additional context in comments.

2 Comments

This is added in version 2.5.3 (currently development release)
Your example {$project: {id: "$_id", count: {$size: "$uniqueUsers"}}} worked for me in version 2.6. Thanks!
2

Sorry I'm a little late to the party. Simply grouping on the 'user_id' and counting the result with a trivial group works just fine and doesn't run into doc size limits.

[
    {$match: {application: 'abc', date: {$gte: startDate, $lte: endDate}}},
    {$unwind: '$user_id'},
    {$group: {_id: '$user_id'}},
    {$group: {_id: 'singleton', count: {$sum: 1}}}
];

1 Comment

It also doesn't satisfy the question "I need to count the unique number of user_ids in a period of time" OP already knows how to do it for each period of time.
1

Use $size to get the size of set.

[
    {
        $match: {"application": "abc"}
    },
    {
        $unwind: "$user_id"
    },
    {
        $group: {
            "_id": "$status",
            "application": "$application",
            "unique_user_id": {$addToSet: "$user_id"}
        }
    },
    {
        $project:{
            "_id": "$_id",
            "application": "$application",
            "count": {$size: "$unique_user_id"}
        }
    }
]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.