0

I have a mongodb collection with multiple documents. Each document has an array with multiple subdocuments (or embedded documents i guess?). Each of these subdocuments is in this format:

{
    "name": string,
    "count": integer
}

Now I want to aggregate these subdocuments to find

  1. The top X counts and their name.
  2. Same as 1. but the names have to match a regex before sorting and limiting.

I have tried the following for 1. already - it does return me the top X but unordered, so I'd have to order them again which seems somewhat inefficient.

[{
    $match: {
        _id: id
}
}, {
    $unwind: {
        path: "$array"
    }
}, {
    $sort: {
        'count': -1
    }
}, {
    $limit: x
}]

Since i'm rather new to mongodb this is pretty confusing for me. Happy for any help. Thanks in advance.

1 Answer 1

1

The sort has to include the array name in order to avoid an additional sort later on.

Given the following document to work with:

    {
      students: [{
        count: 4,
        name: "Ann"
      }, {
        count: 7,
        name: "Brad"
      }, {
        count: 6,
        name: "Beth"
      }, {
        count: 8,
        name: "Catherine"
      }]
    }

As an example, the following aggregation query will match any name containing the letters "h" and "e". This needs to happen after the "$unwind" step in order to only keep the ones you need.

    db.tests.aggregate([
      {$match: {
        _id: ObjectId("5c1b191b251d9663f4e3ce65")
      }},
      {$unwind: {
        path: "$students"
      }},
      {$match: {
        "students.name": /[he]/
      }},
      {$sort: {
        "students.count": -1
      }},
      {$limit: 2}
    ])

This is the output given the above mentioned input:

    { "_id" : ObjectId("5c1b191b251d9663f4e3ce65"), "students" : { "count" : 8, "name" : "Catherine" } }
    { "_id" : ObjectId("5c1b191b251d9663f4e3ce65"), "students" : { "count" : 6, "name" : "Beth" } }

Both names contain the letters "h" and "e", and the output is sorted from high to low.

When setting the limit to 1, the output is limited to:

    { "_id" : ObjectId("5c1b191b251d9663f4e3ce65"), "students" : { "count" : 8, "name" : "Catherine" } }

In this case only the highest count has been kept after having matched the names.

===================== Edit for the extra question:

Yes, the first $match can be changed to filter on specific universities.

      {$match: {
        university: "University X"
      }},

That will give one or more matching documents (in case you have a document per year or so) and the rest of the aggregation steps would still be valid.

The following match would retrieve the students for the given university for a given academic year in case that would be needed.

      {$match: {
        university: "University X",
        academic_year: "2018-2019"
      }},

That should narrow it down to get the correct documents.

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you! As a bonus question: can i change the first $match to match a key that exists once in every document instead of the _id? Let's say each document has a key university and i only want to find students from one specific university.
Sure. Just edited the answer to include the bonus question.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.