2

I'm fairly new to MongoDB and I'm trying to merge an embedded array in a MongoDB collection, my schema for my Project collection is as follows:

Projects:
{
    _id: ObjectId(),
    client_id: String,
    description: String,
    samples: [
        {
            location: String,      //Unique
            name: String,
        }
      ...
    ]
}

A user can upload a JSON file that is in the form of:

[
    {
        location: String,     //Same location as in above schema
        concentration: float
    }
  ...
]

The length of the samples array is the same length as the uploaded data array. I'm trying to figure out how to add the data field into every element of my samples array, but I can't find out how to do it based on MongoDB documentation. I can load my json data in as "data" and I want to merge based on the common "location" field:

db.projects.update({_id: myId}, {$set : {samples.$[].data : data[location]}});

But I can't think of how to get the index on the json array in update query, and I haven't been able to find any examples in the mongodb documentation, or questions like this.

Any help would be much appreciated!

2
  • Why not use a dictionary if you need a key/value type thing? Commented Jun 4, 2018 at 20:23
  • Definitely addressed it, sorry, I'm fairly new to this site, just accepted your answer, thanks again! Commented Jun 13, 2018 at 13:19

1 Answer 1

1

MongoDB 3.6 Positional Filtered Updates

So you're actually in the right "ballpark" with the positional all $[] operator, but the problem is that just simply applies to "every" array element. Since what you want is "matched" entries you actually want the positional filtered $[<identifier>] operator instead.

As you note your "location" is going to be unique and within the array. Using "index positions" is really not reliable for atomic updates, but actually matching the "unique" properties is. Basically you need to get from something like this:

let input = [
  { location: "A", concentration: 3, other: "c" },
  { location: "C", concentration: 4, other: "a" }
];

To this:

{
  "$set": {
    "samples.$[l0].concentration": 3,
    "samples.$[l0].other": "c",
    "samples.$[l1].concentration": 4,
    "samples.$[l1].other": "a"
  },
  "arrayFilters": [
    {
      "l0.location": "A"
    },
    {
      "l1.location": "C"
    }
  ]
}

And that really is just a matter of applying some basic functions to the provided input array:

let arrayFilters = input.map(({ location },i) => ({ [`l${i}.location`]: location }));

let $set = input.reduce((o,{ location, ...e },i) =>
  ({
    ...o,
    ...Object.entries(e).reduce((oe,[k,v]) => ({ ...oe, [`samples.$[l${i}].${k}`]: v }),{})
  }),
  {}
);

log({ $set, arrayFilters });

The Array.map() simply takes the values of the input and creates a list of identifiers to match the location values within arrayFilters. The construction of the $set statement uses Array.reduce() with two iterations being able to merge keys for each array element processed and for each key present in that array element, after removing the location from consideration since this is not being updated.

Alternately, loop with for..of:

let arrayFilters = [];
let $set = {};

for ( let [i, { location, ...e }] of Object.entries(input) ) {
  arrayFilters.push({ [`l${i}.location`]: location });
  for ( let [k,v] of Object.entries(e) ) {
    $set[`samples.$[l${i}].${k}`] = v;
  }
}

Note we use Object.entries() here as well as the "object spread" ... in construction. If you find yourself in a JavaScript environment without this support, then Object.keys() and Object.assign() are basically drop in replacements with little change.

Then those can actually be applied within an update as in:

Project.update({ client_id: 'ClientA' }, { $set }, { arrayFilters });

So the positional filtered $[<identifier>] is actually used here to create "matching pairs" of entries within the $set modifier and within the arrayFilters option of the update(). So for each "location" we create an identifier that matches that value within the arrayFilters and then use that same identifier within the actual $set statement in order to just update the array entry which matches the condition for the identifier.

The only real rule with "identifiers" is that that cannot start with a number, and they "should" be unique but it's not a rule and you simply get the first match anyway. But the updates then only touch those entries which actually match the condition.

Ealier MongoDB fixed Indexes

Failing having support for that, then you are basically falling back to "index positions" and that's really not that reliable. More often than not you will actually need to read each document and determine what is in the array already before even updating. But with at least presumed "parity" where index positions are in place then:

let input = [
  { location: "A", concentration: 3 },
  { location: "B", concentration: 5 },
  { location: "C", concentration: 4 }
];

let $set = input.reduce((o,e,i) =>
  ({ ...o, [`samples.${i}.concentration`]: e.concentration }),{}
);

log({ $set });

Producing an update statement like:

{
  "$set": {
    "samples.0.concentration": 3,
    "samples.1.concentration": 5,
    "samples.2.concentration": 4
  }
}

Or without the parity:

let input = [
  { location: "A", concentration: 3, other: "c" },
  { location: "C", concentration: 4, other: "a" }
];


// Need to get the document to compare without parity
let doc = await Project.findOne({ "client_id": "ClientA" });

let $set = input.reduce((o,e,i) =>
  ({
    ...o,
    ...Object.entries(e).filter(([k,v]) => k !== "location")
      .reduce((oe,[k,v]) =>
        ({
          ...oe,
          [`samples.${doc.samples.map(c => c.location).indexOf(e.location)}`
            + `.${k}`]: v
        }),
        {}
      )
  }),
  {}
);

log({ $set });


await Project.update({ client_id: 'ClientA' },{ $set });

Producing the statement matching on the indexes ( after you actually read the document ):

{
  "$set": {
    "samples.0.concentration": 3,
    "samples.0.other": "c",
    "samples.2.concentration": 4,
    "samples.2.other": "a"
  }
}

Noting of course that for each "update set" you really don't have any other option than to read from the document first to determine which indexes you will update. This generally is not a good idea as aside from the overhead of needing to read each document before a write, there is no absolute guarantee that the array itself remains unchanged by other processes in between the read and the write, so using a "hard index" is making the presumption that everything is still the same, when that may not actually be the case.

Earlier MongoDB positional matches

Where data permits it's generally better to cycle standard positional matched $ updates instead. Here location is indeed unique so it's a good candidate, and most importantly you do not need read the existing documents to compare arrays for indexes:

let input = [
  { location: "A", concentration: 3, other: "c" },
  { location: "C", concentration: 4, other: "a" }
];

let batch = input.map(({ location, ...e }) =>
  ({
    updateOne: {
      filter: { client_id: "ClientA", 'samples.location': location },
      update: {
        $set: Object.entries(e)
          .reduce((oe,[k,v]) => ({ ...oe,  [`samples.$.${k}`]: v }), {})
      }
    }
  })
);

log({ batch });

await Project.bulkWrite(batch);

A bulkWrite() sends multiple update operations, but it does so with a single request and response just like any other update operation. Indeed if you are processing a "list of changes" then returning the document for comparison of each and then constructing one big bulkWrite() is the direction to go in instead of individual writes, and that actually even applies to all previous examples as well.

The big difference is "one update instruction per array element" in the change set. This is the safe way to do things in releases without "positional filtered" support, even if it means more write operations.

Demonstration

A full listing in demonstration follows. Note I'm using "mongoose" here for simplicity, but there is nothing really "mongoose specific" about the actual updates themselves. The same applies to any implementation, and particular in this case the JavaScript examples of using Array.map() and Array.reduce() to process the list for construction.

const { Schema } = mongoose = require('mongoose');

const uri = 'mongodb://localhost/test';

mongoose.Promise = global.Promise;
mongoose.set('debug',true);

const sampleSchema = new Schema({
  location: String,
  name: String,
  concentration: Number,
  other: String
});

const projectSchema = new Schema({
  client_id: String,
  description: String,
  samples: [sampleSchema]
});

const Project = mongoose.model('Project', projectSchema);

const log = data => console.log(JSON.stringify(data, undefined, 2));

(async function() {

  try {

    const conn = await mongoose.connect(uri);

    await Promise.all(Object.entries(conn.models).map(([k,m]) => m.remove()));

    await Project.create({
      client_id: "ClientA",
      description: "A Client",
      samples: [
        { location: "A", name: "Location A" },
        { location: "B", name: "Location B" },
        { location: "C", name: "Location C" }
      ]
    });

    let input = [
      { location: "A", concentration: 3, other: "c" },
      { location: "C", concentration: 4, other: "a" }
    ];

    let arrayFilters = input.map(({ location },i) => ({ [`l${i}.location`]: location }));

    let $set = input.reduce((o,{ location, ...e },i) =>
      ({
        ...o,
        ...Object.entries(e).reduce((oe,[k,v]) => ({ ...oe, [`samples.$[l${i}].${k}`]: v }),{})
      }),
      {}
    );

    log({ $set, arrayFilters });

    await Project.update(
      { client_id: 'ClientA' },
      { $set },
      { arrayFilters }
    );

    let project = await Project.findOne();
    log(project);

    mongoose.disconnect();

  } catch(e) {
    console.error(e)
  } finally {
    process.exit()
  }

})()

And the output for those who cannot be bothered to run, shows the matching array elements updated:

Mongoose: projects.remove({}, {})
Mongoose: projects.insertOne({ _id: ObjectId("5b1778605c59470ecaf10fac"), client_id: 'ClientA', description: 'A Client', samples: [ { _id: ObjectId("5b1778605c59470ecaf10faf"), location: 'A', name: 'Location A' }, { _id: ObjectId("5b1778605c59470ecaf10fae"), location: 'B', name: 'Location B' }, { _id: ObjectId("5b1778605c59470ecaf10fad"), location: 'C', name: 'Location C' } ], __v: 0 })
{
  "$set": {
    "samples.$[l0].concentration": 3,
    "samples.$[l0].other": "c",
    "samples.$[l1].concentration": 4,
    "samples.$[l1].other": "a"
  },
  "arrayFilters": [
    {
      "l0.location": "A"
    },
    {
      "l1.location": "C"
    }
  ]
}
Mongoose: projects.update({ client_id: 'ClientA' }, { '$set': { 'samples.$[l0].concentration': 3, 'samples.$[l0].other': 'c', 'samples.$[l1].concentration': 4, 'samples.$[l1].other': 'a' } }, { arrayFilters: [ { 'l0.location': 'A' }, { 'l1.location': 'C' } ] })
Mongoose: projects.findOne({}, { fields: {} })
{
  "_id": "5b1778605c59470ecaf10fac",
  "client_id": "ClientA",
  "description": "A Client",
  "samples": [
    {
      "_id": "5b1778605c59470ecaf10faf",
      "location": "A",
      "name": "Location A",
      "concentration": 3,
      "other": "c"
    },
    {
      "_id": "5b1778605c59470ecaf10fae",
      "location": "B",
      "name": "Location B"
    },
    {
      "_id": "5b1778605c59470ecaf10fad",
      "location": "C",
      "name": "Location C",
      "concentration": 4,
      "other": "a"
    }
  ],
  "__v": 0
}

Or by hard index:

const { Schema } = mongoose = require('mongoose');

const uri = 'mongodb://localhost/test';

mongoose.Promise = global.Promise;
mongoose.set('debug',true);

const sampleSchema = new Schema({
  location: String,
  name: String,
  concentration: Number,
  other: String
});

const projectSchema = new Schema({
  client_id: String,
  description: String,
  samples: [sampleSchema]
});

const Project = mongoose.model('Project', projectSchema);

const log = data => console.log(JSON.stringify(data, undefined, 2));

(async function() {

  try {

    const conn = await mongoose.connect(uri);

    await Promise.all(Object.entries(conn.models).map(([k,m]) => m.remove()));

    await Project.create({
      client_id: "ClientA",
      description: "A Client",
      samples: [
        { location: "A", name: "Location A" },
        { location: "B", name: "Location B" },
        { location: "C", name: "Location C" }
      ]
    });

    let input = [
      { location: "A", concentration: 3, other: "c" },
      { location: "C", concentration: 4, other: "a" }
    ];


    // Need to get the document to compare without parity
    let doc = await Project.findOne({ "client_id": "ClientA" });

    let $set = input.reduce((o,e,i) =>
      ({
        ...o,
        ...Object.entries(e).filter(([k,v]) => k !== "location")
          .reduce((oe,[k,v]) =>
            ({
              ...oe,
              [`samples.${doc.samples.map(c => c.location).indexOf(e.location)}`
                + `.${k}`]: v
            }),
            {}
          )
      }),
      {}
    );

    log({ $set });


    await Project.update(
      { client_id: 'ClientA' },
      { $set },
    );

    let project = await Project.findOne();
    log(project);

    mongoose.disconnect();

  } catch(e) {
    console.error(e)
  } finally {
    process.exit()
  }

})()

And the output:

Mongoose: projects.remove({}, {})
Mongoose: projects.insertOne({ _id: ObjectId("5b1778e0f7be250f2b7c3fc8"), client_id: 'ClientA', description: 'A Client', samples: [ { _id: ObjectId("5b1778e0f7be250f2b7c3fcb"), location: 'A', name: 'Location A' }, { _id: ObjectId("5b1778e0f7be250f2b7c3fca"), location: 'B', name: 'Location B' }, { _id: ObjectId("5b1778e0f7be250f2b7c3fc9"), location: 'C', name: 'Location C' } ], __v: 0 })
Mongoose: projects.findOne({ client_id: 'ClientA' }, { fields: {} })
{
  "$set": {
    "samples.0.concentration": 3,
    "samples.0.other": "c",
    "samples.2.concentration": 4,
    "samples.2.other": "a"
  }
}
Mongoose: projects.update({ client_id: 'ClientA' }, { '$set': { 'samples.0.concentration': 3, 'samples.0.other': 'c', 'samples.2.concentration': 4, 'samples.2.other': 'a' } }, {})
Mongoose: projects.findOne({}, { fields: {} })
{
  "_id": "5b1778e0f7be250f2b7c3fc8",
  "client_id": "ClientA",
  "description": "A Client",
  "samples": [
    {
      "_id": "5b1778e0f7be250f2b7c3fcb",
      "location": "A",
      "name": "Location A",
      "concentration": 3,
      "other": "c"
    },
    {
      "_id": "5b1778e0f7be250f2b7c3fca",
      "location": "B",
      "name": "Location B"
    },
    {
      "_id": "5b1778e0f7be250f2b7c3fc9",
      "location": "C",
      "name": "Location C",
      "concentration": 4,
      "other": "a"
    }
  ],
  "__v": 0
}

And of course with standard "positional" $ syntax and updates:

const { Schema } = mongoose = require('mongoose');

const uri = 'mongodb://localhost/test';

mongoose.Promise = global.Promise;
mongoose.set('debug',true);

const sampleSchema = new Schema({
  location: String,
  name: String,
  concentration: Number,
  other: String
});

const projectSchema = new Schema({
  client_id: String,
  description: String,
  samples: [sampleSchema]
});

const Project = mongoose.model('Project', projectSchema);

const log = data => console.log(JSON.stringify(data, undefined, 2));

(async function() {

  try {

    const conn = await mongoose.connect(uri);

    await Promise.all(Object.entries(conn.models).map(([k,m]) => m.remove()));

    await Project.create({
      client_id: "ClientA",
      description: "A Client",
      samples: [
        { location: "A", name: "Location A" },
        { location: "B", name: "Location B" },
        { location: "C", name: "Location C" }
      ]
    });

    let input = [
      { location: "A", concentration: 3, other: "c" },
      { location: "C", concentration: 4, other: "a" }
    ];

    let batch = input.map(({ location, ...e }) =>
      ({
        updateOne: {
          filter: { client_id: "ClientA", 'samples.location': location },
          update: {
            $set: Object.entries(e)
              .reduce((oe,[k,v]) => ({ ...oe,  [`samples.$.${k}`]: v }), {})
          }
        }
      })
    );

    log({ batch });

    await Project.bulkWrite(batch);

    let project = await Project.findOne();
    log(project);

    mongoose.disconnect();

  } catch(e) {
    console.error(e)
  } finally {
    process.exit()
  }

})()

And output:

Mongoose: projects.remove({}, {})
Mongoose: projects.insertOne({ _id: ObjectId("5b179142662616160853ba4a"), client_id: 'ClientA', description: 'A Client', samples: [ { _id: ObjectId("5b179142662616160853ba4d"), location: 'A', name: 'Location A' }, { _id: ObjectId("5b179142662616160853ba4c"), location: 'B', name: 'Location B' }, { _id: ObjectId("5b179142662616160853ba4b"), location: 'C', name: 'Location C' } ], __v: 0 })
{
  "batch": [
    {
      "updateOne": {
        "filter": {
          "client_id": "ClientA",
          "samples.location": "A"
        },
        "update": {
          "$set": {
            "samples.$.concentration": 3,
            "samples.$.other": "c"
          }
        }
      }
    },
    {
      "updateOne": {
        "filter": {
          "client_id": "ClientA",
          "samples.location": "C"
        },
        "update": {
          "$set": {
            "samples.$.concentration": 4,
            "samples.$.other": "a"
          }
        }
      }
    }
  ]
}
Mongoose: projects.bulkWrite([ { updateOne: { filter: { client_id: 'ClientA', 'samples.location': 'A' }, update: { '$set': { 'samples.$.concentration': 3, 'samples.$.other': 'c' } } } }, { updateOne: { filter: { client_id: 'ClientA', 'samples.location': 'C' }, update: { '$set': { 'samples.$.concentration': 4, 'samples.$.other': 'a' } } } } ], {})
Mongoose: projects.findOne({}, { fields: {} })
{
  "_id": "5b179142662616160853ba4a",
  "client_id": "ClientA",
  "description": "A Client",
  "samples": [
    {
      "_id": "5b179142662616160853ba4d",
      "location": "A",
      "name": "Location A",
      "concentration": 3,
      "other": "c"
    },
    {
      "_id": "5b179142662616160853ba4c",
      "location": "B",
      "name": "Location B"
    },
    {
      "_id": "5b179142662616160853ba4b",
      "location": "C",
      "name": "Location C",
      "concentration": 4,
      "other": "a"
    }
  ],
  "__v": 0
}
Sign up to request clarification or add additional context in comments.

4 Comments

This is great and works perfectly, lets say not that I have properties that I want to update, such as setting both a concentration and a volume for each sample. I'm trying to iterate over the keys in my input documents: let keys = Object.keys(input), and then use that inside the reduce function: let $set = result.reduce((o, e, i) => keys.forEach((key) => ({ ...o, keys.forEach((key) => ([`samples.$[l${i}].${key}`]: e[key) }), {} ); But this is not working for me, any suggestions?
I believe this is because I am running mongo 3.4.10, and according to this it is not implemented in my version yet
@nrichman Well that's exactly what the answer tells you and it's pretty specific about being a MongoDB 3.6 feature as well as being prominent on the documentation itself. What I also showed you is using the actual "index", though bad practice it also works. See the lines with things like "samples.0.concentration": 3. Note like I say that unless your input is an exact copy, then you need to read each document, read the array and compare and then perform the update.
@nrichman Expanded on the different solutions with some more context, examples and usage of any number of additional keys to location in each possible "change set". You really should be looking at the section with positional $ filtered operators and usage as that is generally the better option for earlier MongoDB versions with "positional filtered" support.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.