6

New to Mongo, have found lots of examples of removing dupes from arrays of strings using the aggregation framework, but am wondering if possible to remove dupes from array of objects based on a field in the object. Eg

{
"_id" : ObjectId("5e82661d164941779c2380ca"),
"name" : "something",
"values" : [
    {
        "id" : 1,
        "val" : "x"
    },
    {
        "id" : 1,
        "val" : "x"
    },
    {
        "id" : 2,
        "val" : "y"
    },
    {
        "id" : 1,
        "val" : "xxxxxx"
    }
]
}

Here I'd like to remove dupes based on the id field. So would end up with

{
"_id" : ObjectId("5e82661d164941779c2380ca"),
"name" : "something",
"values" : [
    {
        "id" : 1,
        "val" : "x"
    },
    {
        "id" : 2,
        "val" : "y"
    }
]
}

Picking the first/any object with given id works. Just want to end up with one per id. Is this doable in aggregation framework? Or even outside aggregation framework, just looking for a clean way to do this. Need to do this type of thing across many documents in collection, which seems like a good use case for aggregation framework, but as I mentioned, newbie here...thanks.

2 Answers 2

15

Well, you may get desired result 2 ways.

Classic

Flatten - Remove duplicates (pick first occurrence) - Group by

db.collection.aggregate([
  {
    $unwind: "$values"
  },
  {
    $group: {
      _id: "$values.id",
      values: {
        $first: "$values"
      },
      id: {
        $first: "$_id"
      },
      name: {
        $first: "$name"
      }
    }
  },
  {
    $group: {
      _id: "$id",
      name: {
        $first: "$name"
      },
      values: {
        $push: "$values"
      }
    }
  }
])

MongoPlayground

Modern

We need to use $reduce operator.

Pseudocode:

values : {
  var tmp = [];
  for (var value in values) {
      if !(value.id in tmp)
        tmp.push(value);
  }
  return tmp;
}

db.collection.aggregate([
  {
    $addFields: {
      values: {
        $reduce: {
          input: "$values",
          initialValue: [],
          in: {
            $concatArrays: [
              "$$value",
              {
                $cond: [
                  {
                    $in: [
                      "$$this.id",
                      "$$value.id"
                    ]
                  },
                  [],
                  [
                    "$$this"
                  ]
                ]
              }
            ]
          }
        }
      }
    }
  }
])

MongoPlayground

Sign up to request clarification or add additional context in comments.

6 Comments

Thank you! Is there any difference/preference between "classic" and "modern" in terms of performance or dealing with large collections?
@chacmool Need to try. If you would like to see a benchmark, let me know.
Love reacts only <3
I get the error undefined variable for "value" where are we defining value?
@JayHaran $$value and $$this are reserved variables for $reduce operator. If you iterate an array, $$this is iterating item, while $$value is last stored item
|
3

You can use $reduce, Try below query :

db.collection.aggregate([
  {
    $addFields: {
      values: {
        $reduce: {
          input: "$values",
          initialValue: [],
          in: {
            $cond: [
              { $in: ["$$this.id", "$$value.id"] }, /** Check if 'id' exists in holding array if yes push same array or concat holding array with & array of new object */
              "$$value",
              { $concatArrays: ["$$value", ["$$this"]] }
            ]
          }
        }
      }
    }
  }
]);

Test : MongoDB-Playground

4 Comments

Was typing the explanation :D
@Valijon : did not get what you're saying ?
I was posting the same solution and was writing the explanation. But you've posted first
@Valijon :-) :-)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.