Query from two collections in MongoDB

Question

I'm trying to find my way around MongoDB. It's the first time I'm using this database, coming from MySQL. But for a chat application I'm making, MongoDB was recommended as a better fit.

I have two collections:

conversations in which I store the members userID (which is stored in a MySQL database) and the join date.

{
    "_id" : ObjectId("5e35f2c840713a43aeeeb3d9"),
    "members" : [ 
        {
            "uID" : "1",
            "j" : 1580580922
        }, 
        {
            "uID" : "4",
            "j" : 1580580922
        }, 
        {
            "uID" : "5",
            "j" : 1580580922
        }
    ]
}

messages in which I store the sender (userID), message, timestamp, conversationID (from the collection above), read and delivered status

{
    "_id" : ObjectId("5e35ee5f40713a43aeeeb1c5"),
    "c_ID" : ObjectId("5e35f2c840713a43aeeeb3d9"),
    "fromID" : "1",
    "msg" : "What's up?",
    "t" : 1580591922,
    "d" : {
        "4" : 1580592039
    },
    "r" : {
        "4" : 1580592339
    }
}

What I want to do now is query the conversations for a specific user, let's say userID 1, together with the last message sent in that conversation.

I came up with the following:

db.getCollection('conversations').aggregate(
[{
    $match: {
        "members.uID": "1"
    }
},
{
    $lookup: {
        as: 'lastMessage',
        foreignField: 'c_ID',
        from: 'messages',
        localField: '_id',
    }
},
])

But the problem here is that it lists all the messages, not only the last one. So I would like to limit this to 1, or if there is an alternative approach.. please let me know.

Any help is appreciated!

If you're not tied to these particular schemas, I would recommend storing the messages as an array in the conversation. That should greatly simplify any aggregation query you wanted to perform to find particular message(s) within a conversation (whether that's first message, most recent message, all messages from a certain user, etc). This will also probably be a better fit for how your chat application would use conversations and messages. Essentially, this is eliminating the "join" that is needed to find messages in a conversation. — donutsftw
– donutsftw, Commented Feb 5, 2020 at 14:22
I did consider that, but then I would need to perform an update every time a message is added. That's not very performant, is it? — PennyWise
– PennyWise, Commented Feb 5, 2020 at 15:29
An insert will likely be faster than an update, but probably not by much if you're updating the conversation by _id and only adding new messages. I think the benefits of having the messages contained within the conversation outweigh the slightly slower write performance: Your data is more naturally structured, making it easier conceptually to work with, and read performance is improved, as now you no longer need two calls to the DB for the application to load a conversation (1 to get the conversation, then 1 to get all the messages in the conversation). — donutsftw
– donutsftw, Commented Feb 5, 2020 at 16:45
Would the solution below be bad? Also, if there are a lot of messages, holding all the IDs could reach the 16mb limit, no? — PennyWise
– PennyWise, Commented Feb 5, 2020 at 16:55
There's nothing wrong with the solution below that I see (although it's been a while since I did much with aggregation). You could run into the document size limit with lots of messages, so if that's a realistic concern I'd consider keeping a separate collection named oldMessages: Each object in this collection contains an array of messages, and optionally an _id of another object in oldMessages for even older messages. Your conversation entries may or may not have an "oldMessages" field, which if present is the _id of the next oldest set of messages after the ones already there. — donutsftw
– donutsftw, Commented Feb 5, 2020 at 18:05

SuleymanSah · Accepted Answer · 2020-02-05 16:24:18Z

1

I guess we can understand the last message from timestamp field.

After $match, and $lookup stages, we need to $unwind messages, and then $sort by timestamp.

Now the first message in the messages array is the lastMessage, so when we group, we push the first message as lastMessage, and finally $replaceRoot to shape our result.

If so you can use the following aggregation:

db.conversations.aggregate([
  {
    $match: {
      "members.uID": "1"
    }
  },
  {
    $lookup: {
      foreignField: "c_ID",
      from: "messages",
      localField: "_id",
      as: "messages"
    }
  },
  {
    "$unwind": "$messages"
  },
  {
    "$sort": {
      "messages.t": -1
    }
  },
  {
    "$group": {
      "_id": "$_id",
      "lastMessage": {
        "$first": "$messages"
      },
      "allFields": {
        "$first": "$$ROOT"
      }
    }
  },
  {
    "$replaceRoot": {
      "newRoot": {
        "$mergeObjects": [
          "$allFields",
          {
            "lastMessage": "$lastMessage"
          }
        ]
      }
    }
  },
  {
    $project: {
      messages: 0
    }
  }
])

If the messages array is already sorted, the solution can be simplified, but this is a general solution.

Playground

edited Feb 5, 2020 at 16:24

answered Feb 5, 2020 at 14:02

SuleymanSah

18k6 gold badges38 silver badges60 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

PennyWise Over a year ago

Seems to do what I want! Only thing I have left is, I now have messages and lastMessage, containing the same data. Can I "hide" or remove the messages object? Or is it needed to perform the query?

SuleymanSah Over a year ago

@PennyWise you can easily remove unwanted fields using project stage, I updated the answer. Please don't forget to mark this answer and upvote.

PennyWise Over a year ago

I seem to miss the project stage in the updated answer. Can you add it again?

Collectives™ on Stack Overflow

Query from two collections in MongoDB

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related