2

Problem 1

I have a collection named recipe in which all docs have a array field ingredients. I want to count those array items and write them into a new field ingredient_count.

Problem 2

There is also a collection named ingredient. The docs have a count field which is the total number of uses in all recipes.

My Current Approach

My solution right now is a script that aggregates over the collection and updates all documents one by one:

// PROBLEM 1: update recipe documents
db.recipe.aggregate(
    [
        {
            $project: {
                numberOfIngredients: { $size: "$ingredients" }
            }
        }
    ]
).forEach(function(recipe) {
    db.recipe.updateOne(
        { _id: recipe._id },
        { $set: { incredient_count: recipe.numberOfIngredients } }
    )
});

// PROBLEM 2: update ingredient documents
db.ingredient.find().snapshot().forEach(function(ingredient) {
    db.ingredient.updateOne(
        { _id: ingredient._id },
        { $set: { count: db.recipe.count({ ingredients: { $in: [ingredient.name] } })) } }
    )
});

This is terribly slow. Any idea how to do this more efficiently?

1 Answer 1

3

For both problem it's possible to only perform aggregation that output to new collections that would replace existing one :

Problem1

The aggregation contains one $project for counting ingredients with the list of field to keep :

db.recipe.aggregate([{
    $project: {
        ingredients: 1,
        numberOfIngredients: { $size: "$ingredients" }
    }
}, {
    $out: "recipeNew"
}])

that give you :

{ "_id" : ObjectId("58155bc09c924e717c5c4240"), "ingredients" : [......], "numberOfIngredients" : 5 }
{ "_id" : ObjectId("58155bc19c924e717c5c4241"), "ingredients" : [......], "numberOfIngredients" : 3 }

The result of the aggregation is written to a new collection recipeNew that can replace the existing recipe collection

Problem2

The aggregation contains :

  • 1 $unwind to remove ingredients array
  • 1 $group to sum occurence of each ingredients & group by ingredients _id
  • 1 $lookup that join ingredients collection to the current aggregation to retrieve all fields for specified ingredients
  • 1 $unwind to remove the array of imported ingredients items
  • 1 $project to select fields to keep
  • 1 $out to output the result to a new collection

Query is :

db.recipe.aggregate([{
    $unwind: "$ingredients"
}, {
    $group: { _id: "$ingredients", IngredientsNumber: { $sum: 1 } }
}, {
    $lookup: {
        from: "ingredients",
        localField: "_id",
        foreignField: "_id",
        as: "ingredientsDB"
    }
}, {
    $unwind: { path: "$ingredientsDB", preserveNullAndEmptyArrays: true }
}, {
    $project: {
        ingredientsNumber: "$IngredientsNumber",
        name: "$ingredientsDB.name"
    }
}, {
    $out: "ingredientsTemp"
}])

That gives :

{ "_id" : ObjectId("5812caaeb4829937f4599b54"), "ingredientsNumber" : 2, "name" : "ingredients5" }
{ "_id" : ObjectId("5812caaeb4829937f4599b53"), "ingredientsNumber" : 1, "name" : "ingredients4" }
{ "_id" : ObjectId("5812caaeb4829937f4599b52"), "ingredientsNumber" : 2, "name" : "ingredients3" }
{ "_id" : ObjectId("5812caaeb4829937f4599b51"), "ingredientsNumber" : 1, "name" : "ingredients2" }
{ "_id" : ObjectId("5812caaeb4829937f4599b50"), "ingredientsNumber" : 2, "name" : "ingredients1" }

The cons of this solution :

  • It uses $project so you need to specify the fields to keep
  • you will get a new ingredientsTemp collection containing only ingredients that are actually present in recipes so one additionnal aggregation with a $lookup should be necessary to join the existing one with the one you got from that aggregation :

The following will join the existing ingredients collection with the one we have created :

db.ingredients.aggregate([{
    $lookup: {
        from: "ingredientsTemp",
        localField: "_id",
        foreignField: "_id",
        as: "ingredientsDB"
    }
}, {
    $unwind: { path: "$ingredientsDB", preserveNullAndEmptyArrays: true }
}, {
    $project: {
        name: "$name",
        ingredientsNumber: "$ingredientsDB.ingredientsNumber"
    }
}])

Then you would have :

{ "_id" : ObjectId("5812caaeb4829937f4599b50"), "name" : "ingredients1", "ingredientsNumber" : 2 }
{ "_id" : ObjectId("5812caaeb4829937f4599b51"), "name" : "ingredients2", "ingredientsNumber" : 1 }
{ "_id" : ObjectId("5812caaeb4829937f4599b52"), "name" : "ingredients3", "ingredientsNumber" : 2 }
{ "_id" : ObjectId("5812caaeb4829937f4599b53"), "name" : "ingredients4", "ingredientsNumber" : 1 }
{ "_id" : ObjectId("5812caaeb4829937f4599b54"), "name" : "ingredients5", "ingredientsNumber" : 2 }
{ "_id" : ObjectId("5812caaeb4829937f4599b57"), "name" : "ingredients6" }

The goods :

  • It uses only aggregation so it should be quicker
Sign up to request clarification or add additional context in comments.

1 Comment

This is golden, thanks for the great explanation and examples.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.