4

This is the document structure in mongoDB

{ "_id" :ObjectId("9elesdf3lk3jppefll34d210"), "category" :"data1",product:'data'}
{ "_id" :ObjectId("9elesdf3lk3jppefll34d211"), "category" : "data2",product:'data'}
{ "_id" :ObjectId("9elesdf3lk3jppefll34d211"), "category" : "data1",product:'data' }

where category is indexed. I want to take a distinct count of the category field.

Currently I am using the following code to take the counts

db.collection.aggregate( 
   {$group : {_id : "$category"} }, 
   {$group: {_id:1, count: {$sum : 1 }}})

This query was giving me proper counts but my database is increasing day by day and the query is taking longer to execute. Is there some other methodology to take the counts in a faster way?

5
  • 3
    Have you tested the performance of db.collection.distinct('category').length as an alternative? distinct can use an index but $group cannot. Commented Jul 24, 2016 at 4:01
  • @JohnnyHK i have more than 10 million documents and growing..will it be able to procees so much data at once? Commented Jul 24, 2016 at 6:39
  • As long as it's able to use an index, sure. Roughly how many different categories are there? Commented Jul 24, 2016 at 13:20
  • @mikhilmohanan You should start accepting answers when you want help from SO users in the future, as there are 8 questions from you out there with answers but you didn't accept any of them ... Commented Jul 29, 2016 at 4:34
  • @DAXaholic Thankyou for oyur comment.ill take care of it in the future. Commented Aug 1, 2016 at 13:41

1 Answer 1

3

As already pointed out by JohnnyHK, use db.collection.distinct if possible as it provides the chance of leveraging indexes

So in your case db.collection.distinct('category').length should be pretty fast.
If you still suffer from performance issues then have a look at

db.collection.explain().distinct('category')  

to see the execution plan of the query and take actions on it or provide it to this question so that we see whether your index is actually used.

Sign up to request clarification or add additional context in comments.

2 Comments

i have executed the query..can you help me in identity which field specify that my index is being used or not
If you execute the ..explain().distinct('category') then you should see somewhere under queryPlanner.winningPlan a 'DISTINCT_SCAN' stage. If you see a 'COLLSCAN' stage then your index is not used.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.