0

I have a very simple collection with no indexes except _id. Each document contains a single array field, mies. Current collection size is around 100 millions, and I can see the following anomalies in the profiler:

{
  "op": "update",
  "ns": "DB.links",
  "command": {
    "q": {
      "_id": "f1b54f37-7f92-4e75-9ec6-5329349ce792_eb370c8a-6f33-4989-aa59-a26e1c9df46c"
    },
    "u": {
      "$push": {
        "mies": {
          "$each": [
            {
              "$oid": "5e39d07bec34b8000e7f86b7"
            }
          ]
        }
      }
    },
    "multi": true,
    "upsert": true
  },
  "keysExamined": 0,
  "docsExamined": 0,
  "nMatched": 0,
  "nModified": 0,
  "upsert": true,
  "keysInserted": 1,
  "numYield": 0,
  "locks": {
    "Global": {
      "acquireCount": {
        "r": 2,
        "w": 2
      }
    },
    "Database": {
      "acquireCount": {
        "w": 2
      },
      "acquireWaitCount": {
        "w": 1
      },
      "timeAcquiringMicros": {
        "w": 19486143
      }
    },
    "Collection": {
      "acquireCount": {
        "w": 1
      }
    },
    "oplog": {
      "acquireCount": {
        "w": 1
      }
    }
  },
  "millis": 19490,
  "planSummary": "IDHACK",
  "execStats": {
    "stage": "UPDATE",
    "nReturned": 0,
    "executionTimeMillisEstimate": 0,
    "works": 2,
    "advanced": 0,
    "needTime": 1,
    "needYield": 0,
    "saveState": 0,
    "restoreState": 0,
    "isEOF": 1,
    "invalidates": 0,
    "nMatched": 0,
    "nWouldModify": 0,
    "nInvalidateSkips": 0,
    "wouldInsert": true,
    "fastmodinsert": false,
    "inputStage": {
      "stage": "IDHACK",
      "nReturned": 0,
      "executionTimeMillisEstimate": 0,
      "works": 1,
      "advanced": 0,
      "needTime": 0,
      "needYield": 0,
      "saveState": 0,
      "restoreState": 0,
      "isEOF": 1,
      "invalidates": 0,
      "keysExamined": 0,
      "docsExamined": 0
    }
  }
}

As you can see, a simple upsert with a single $push to array took 19 seconds. I believe most of the time was spent here:

      "timeAcquiringMicros": {
        "w": 19486143
      }

What should I check? How can I improve the performance?

3
  • As you're only querying on _id which is an default indexed field, then query is quiet ok, but what concerns is 100 millions of docs, maybe why don't you think of sharding or somehow archiving old records if not needed on much of reads ? Commented Feb 4, 2020 at 20:39
  • Yes, but due to business logic, the data should be accumulated for a period of time. I can see that index size for hot collections exceeds MongoDB's cache size. Is that the reason for slow locks? Commented Feb 4, 2020 at 20:42
  • 1
    I would say at least separate that collection from others & maintain it alone in a different cluster if possible(if you do not want to do sharding & want all docs to be on same cluster) - that way you can have the cluster size(RAM + disk) good enough for that collection & also high I/O... Check this documentation :: docs.mongodb.com/manual/tutorial/ensure-indexes-fit-ram Commented Feb 4, 2020 at 20:53

2 Answers 2

2

While this doesn't directly provide a solution to your problem, I didn't see any else mention this as a potential cause of it. One of the reasons why your operation is slow, is due to how mongo handles this operation; specifically, in regards to mongo's replication design. If you don't care for the explanation, just skip to the bottom.

Per the link above:

Every operation that modifies the data on the primary is stored in a special capped collection called oplog (short for operations log). The oplog is asynchronously synchronized to all the secondaries which replay the operations to eventually be consistent with the primary.

For oplog to work, it needs to be idempotent. Given that $push is not, mongo converts this operation into a $set - replacing the entire array.

The bottom line is that whenever you push something to an array the entire array is replaced as far as replication goes.

Sign up to request clarification or add additional context in comments.

Comments

0

You cannot

MongoDB uses B-tree algorithm for indexing opeations like: insert, search, delete.

Algorithm   Average     Worst case
Space       O(n)        O(n)
Search      O(log n)    O(log n)
Insert      O(log n)    O(log n)
Delete      O(log n)    O(log n)

MongoDB needs to drill down ordered index to find a value. As _id is unique and default index, MongoDB needs 27 iteration to find a document in the worst case ( 227 for 134.217.728 [~19sec], 228 for 268.435.456 [~21sec], etc...)

You can improve search operation by creating compound index, but will penalize insert as MongoDB needs to update _id index + cound index.

4 Comments

Which compound index are you talking about? And how the search is related to 19 sec lock operation?
Imagine you have only 100 documents and MongoDB need 1 sec for each iteration. If you don't have index, MonogDB scans from 1, 2, 3, ... 100 (100 secs) (known also as O(n)). But with B-tree you need only 6 tries (6 secs) (discussed here). Compound index discussed here
I know what BTree is, and the compound index. Unfortunately it has nothing to do with the problem
@silent-box I've answered to your main question: How can I improve the performance? and explained why it's not possible

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.