Aggregate on array of embedded documents

Question

I have a mongodb collection with multiple documents. Each document has an array with multiple subdocuments (or embedded documents i guess?). Each of these subdocuments is in this format:

{
    "name": string,
    "count": integer
}

Now I want to aggregate these subdocuments to find

The top X counts and their name.
Same as 1. but the names have to match a regex before sorting and limiting.

I have tried the following for 1. already - it does return me the top X but unordered, so I'd have to order them again which seems somewhat inefficient.

[{
    $match: {
        _id: id
}
}, {
    $unwind: {
        path: "$array"
    }
}, {
    $sort: {
        'count': -1
    }
}, {
    $limit: x
}]

Since i'm rather new to mongodb this is pretty confusing for me. Happy for any help. Thanks in advance.

georgbc · Accepted Answer · 2018-12-20 17:47:47Z

1

The sort has to include the array name in order to avoid an additional sort later on.

Given the following document to work with:

    {
      students: [{
        count: 4,
        name: "Ann"
      }, {
        count: 7,
        name: "Brad"
      }, {
        count: 6,
        name: "Beth"
      }, {
        count: 8,
        name: "Catherine"
      }]
    }

As an example, the following aggregation query will match any name containing the letters "h" and "e". This needs to happen after the "$unwind" step in order to only keep the ones you need.

    db.tests.aggregate([
      {$match: {
        _id: ObjectId("5c1b191b251d9663f4e3ce65")
      }},
      {$unwind: {
        path: "$students"
      }},
      {$match: {
        "students.name": /[he]/
      }},
      {$sort: {
        "students.count": -1
      }},
      {$limit: 2}
    ])

This is the output given the above mentioned input:

    { "_id" : ObjectId("5c1b191b251d9663f4e3ce65"), "students" : { "count" : 8, "name" : "Catherine" } }
    { "_id" : ObjectId("5c1b191b251d9663f4e3ce65"), "students" : { "count" : 6, "name" : "Beth" } }

Both names contain the letters "h" and "e", and the output is sorted from high to low.

When setting the limit to 1, the output is limited to:

    { "_id" : ObjectId("5c1b191b251d9663f4e3ce65"), "students" : { "count" : 8, "name" : "Catherine" } }

In this case only the highest count has been kept after having matched the names.

===================== Edit for the extra question:

Yes, the first $match can be changed to filter on specific universities.

      {$match: {
        university: "University X"
      }},

That will give one or more matching documents (in case you have a document per year or so) and the rest of the aggregation steps would still be valid.

The following match would retrieve the students for the given university for a given academic year in case that would be needed.

      {$match: {
        university: "University X",
        academic_year: "2018-2019"
      }},

That should narrow it down to get the correct documents.

edited Dec 20, 2018 at 17:47

answered Dec 20, 2018 at 4:43

georgbc

567 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Maku Over a year ago

Thank you! As a bonus question: can i change the first $match to match a key that exists once in every document instead of the _id? Let's say each document has a key university and i only want to find students from one specific university.

georgbc Over a year ago

Sure. Just edited the answer to include the bonus question.

Collectives™ on Stack Overflow

Aggregate on array of embedded documents

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related