1

I am new to the mongo database. Using Robo3t software
I have to find out duplicate values inside an array based on channel_id.
I did a research and found that aggregation needs to be used to do grouping and find respective count.
I have developed the following query but results are not as expected.

Sample Documents:

{
    "_id" : ObjectId("59b674d141b47e5401897d31"),
    "subscribed_channels" : [ 
        {
            "channel_id" : "1001",
            "channel_name" : "StarPlus",
            "channelPrice":"100"
        }, 
        {
            "channel_id" : "1002",
            "channel_name" : "StarGold",
            "channelPrice":"75"
        }, 
        {
            "channel_id" : "1001",
            "channel_name" : "StarPlus",
            "channelPrice":"100"
        },
        {
            "channel_id" : "1003",
            "channel_name" : "SetMax",
            "channelPrice":"80"
        }
    ],
    "viewer_account_id" : "59b6745b41b47e5401143b3d",
    "public_id_type" : "PHONE_NUMBER",
    "viewer_id" : "+919322264403",
    "role" : "CONSUMER",
    "active" : true,
    "date_time_created" : NumberLong(1505129681330),
    "date_time_modified" : NumberLong(1569320824387)
}

{
        "_id" : ObjectId("59b674d141b47e5401897d31"),
        "subscribed_channels" : [ 
            {
                "channel_id" : "1001",
                "channel_name" : "StarPlus",
                "channelPrice":"100"
            }, 
            {
                "channel_id" : "1002",
                "channel_name" : "StarGold",
                "channelPrice":"75"
            }, 
            {
                "channel_id" : "1001",
                "channel_name" : "StarPlus",
                "channelPrice":"100"
            },
             {
                "channel_id" : "1001",
                "channel_name" : "StarPlus",
                "channelPrice":"100"
            }
        ],
        "viewer_account_id" : "59b6745b41b47e5401143c56",
        "public_id_type" : "PHONE_NUMBER",
        "viewer_id" : "+919322264404",
        "role" : "CONSUMER",
        "active" : true,
        "date_time_created" : NumberLong(1505129681330),
        "date_time_modified" : NumberLong(1569320824387)
    }

Above are just 2 records of document viewers

Query :

db.getCollection('viewers').aggregate([ 
        {
                    "$group" : 
                    {_id:{
                        //viewer_id:"$consumer_id",
                        enterprise_id:"$subscribed_channels.channel_id",
                         }, 
                         "viewer_id": {
                             $first: "$viewer_id"
                        },
                        count:{$sum:1}
                        }},

                        {
                          "$match": {"count": { "$gt": 1 }}
                        }
                 ]) 

Actual Output :

{
    "_id" : {
        "enterprise_id" : [ 
            "1001", 
            "1001", 
            "1002",
            "1003"
        ]
    },
    "consumer_id" : "+919322264403",
    "count" : 2.0
}
{
    "_id" : {
        "enterprise_id" : [ 
            "1001", 
            "1002", 
            "1001",
            "1001
        ]
    },
    "consumer_id" : "+919322264404",
    "count" : 2.0
}

Expected Output :

I want to group based on subscribed_channels.channel_id and get a count respectively

{
    "_id" : {
        "enterprise_id" : [ 
            "1001", 
            "1001", 
            "1002",
            "1003"
        ]
    },
    "consumer_id" : "+919322264403",
    "count" : 2.0
}
{
    "_id" : {
        "enterprise_id" : [ 
            "1001", 
            "1001", 
            "1001",
            "1002
        ]
    },
    "consumer_id" : "+919322264404",
    "count" : 3.0
}

Grouping is not happening based on channel_id, also the count is incorrect.
The count is not even giving me no of channel-id subscribed, also not giving duplicate channel_ids.

Please guide me in building a query that gives the correct result.

5
  • So if you want duplicates first doc will have 1 & second will have 2, If I'm not wrong is that correct or what you've given is right ? Cause in 1st doc ["1001", "1002", "1003"] will be unique only duplicate is another 1001.. Then if you've this [ "1002", "1002", "1001", "1001" ] do you consider it as 4 duplicates ? Commented Mar 18, 2020 at 14:46
  • Hi @whoami. Thank you for replying. i want results as per documents. 1st doc would give count as 2 since there are two 1001 and 2nd doc should give count as 3 since there are three 1001. Also as per your understanding if i get first doc giving 1 & second doc giving 2 it will work.Please let me know if any other clarifications are needed, i will update my question Commented Mar 18, 2020 at 15:03
  • Hi @whoami. I want to highlight documents which contain duplicates based on channel_id . Can you give me a head start for query ? Commented Mar 18, 2020 at 15:10
  • I feel giving count as 1 for 1st & 2 for 2nd is perfect which is correct cause those are the no.of elements duplicated (Also if you just need docs which has duplicates you don't need count, Is that your actual question ? Or do you want all docs & a added field(some field like hasDups : true) for those docs which has duplicates ?) Commented Mar 18, 2020 at 15:12
  • 1
    @whoami , Yes thank for your suggestion. Yes you are right count as 1 for 1st and 2 for second is perfect and would suit my requirement as i would know which channel_ids are repeated within a document. Also a added field will also suffice but option1 looks more prominent. Commented Mar 18, 2020 at 15:19

1 Answer 1

1

Try below query :

Query :

db.collection.aggregate([
  /** project only needed fields & transform fields as you like */
  {
    $project: {
      customer_id: "$viewer_id",
      enterprise_id: "$subscribed_channels.channel_id",
      count: {
        /** Subtract size of original array & newly formed array which has unique values to get count of duplicates */
        $subtract: [
          {
            $size: "$subscribed_channels.channel_id" // get size of original array
          },
          {
            $size: {
              $setUnion: ["$subscribed_channels.channel_id", []] // This will give you an array with unique elements & get size of it
            }
          }
        ]
      }
    }
  }
]);

Test : MongoDB-Playground

Sign up to request clarification or add additional context in comments.

14 Comments

hi @whoami . executing the above query gives error as Error: command failed: { "ok" : 0, "errmsg" : "The argument to $size must be an array, but was of type: missing", "code" : 17124, "codeName" : "Location17124" } : aggregate failed . are my documents corrupt ?
@AjinkyaKarode : Yes, I think some of your docs doesn't have subscribed_channels as array. Can you see if that's correct ? What do you wanted to do on those ?
hi @whoami Sorry, i just verified all documents and yes in some docs there is no subscribed_channel field since that viewer has no subscription. How can i handle such docs ?
@AjinkyaKarode : what do you've to do to those docs ? Do you want to remove those from result ?
Hi @whoami .Thanks for all your efforts,Just verfied the output. mongoplayground.net/p/GxtKlE0fkst and i got count more than 0. I ll check out the website you recommended, but also if i want to learn from you is it possible?any email id or linkedin?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.