3

I'm facing a counter-intuitive performance issue with my MongoDB sharded cluster where queries with fewer values in an $in clause are significantly slower than queries with more values.

The Issue:

Query with 8 values in $in: 300ms

Query with 3 values in $in: 30 seconds (100x slower)

Both queries use the same collection, same index, and same shard key.

Document Structure (example):

{
  "userId": "user123",
  "timestamp": "2025-01-15T10:30:00.000Z",
  "eventType": "page_view",
  "sessionId": "session_2025-01-15",  // Shard Key
  "metadata": {...}
}

Query Code:

// Fast - 300ms
const fast = await collection.find({
  sessionId: {$in: ['val1','val2','val3','val4','val5','val6','val7','val8']}
})
  .sort({timestamp: -1})
  .limit(50)
  .hint("sessionId_1_timestamp_-1")
  .toArray();

// Slow - 30 seconds  
const slow = await collection.find({
  sessionId: {$in: ['val1','val2','val3']}
})
.sort({timestamp: -1})
.limit(50)
.hint("sessionId_1_timestamp_-1")
.toArray();

Index: {sessionId: 1, timestamp: -1} (shard key is sessionId)

explainStats:
  stages: [
    {
      stage: '$query',
      timeInclusiveMS: 60811.2455,
      timeExclusiveMS: 60811.2455,
      in: 837525,
      out: 837525,
      dependency: {
        getNextPageCount: 31,
        count: 30,
        time: 0,
        bytes: 259495372
      },
      details: {
        database: 'test-db',
        collection: 'session-events',
        query: {
          sessionId: {
            '$in': [
              'session_2025-01-15',
              'session_2025-01-16'
            ]
          }
        },
        indexUsage: {
          pathsIndexed: {
            individualIndexes: [],
            compoundIndexes: [
              {
                sessionId: 1,
                timestamp: -1
              }
            ]
          },
          pathsNotIndexed: {
            individualIndexes: [
              'sessionId'
            ],
            compoundIndexes: []
          }
        },
        sort: {
          timestamp: -1
        },
        shardInformation: [
          {
            activityId: '<empty>',
            shardKeyRangeId: '[,15555555555555555555555555555555) move next',
            durationMS: 2648.2615,
            preemptions: 0,
            outputDocumentCount: 27932,
            retrievedDocumentCount: 27934
          },
          ...
        ],
       queryMetrics: {
          retrievedDocumentCount: 865467,
          retrievedDocumentSizeBytes: 408356437,
          outputDocumentCount: 865457,
          outputDocumentSizeBytes: 325033343,
          indexHitRatio: 1,
          totalQueryExecutionTimeMS: 43843.4998,
          queryPreparationTimes: {
            queryCompilationTimeMS: 3.16,
            logicalPlanBuildTimeMS: 1.57,
            physicalPlanBuildTimeMS: 4.79,
            queryOptimizationTimeMS: 0.03
          },
          indexLookupTimeMS: 1575.1799,
          documentLoadTimeMS: 34403.5198,
          vmExecutionTimeMS: 43132.1599,
          runtimeExecutionTimes: {
            queryEngineExecutionTimeMS: 2853.9703,
            systemFunctionExecutionTimeMS: 61.66,
            userDefinedFunctionExecutionTimeMS: 0
          },
          documentWriteTimeMS: 4299.4899
        }
      }
    }
  ],
  estimatedDelayFromRateLimitingInMilliseconds: 0,      

The performance inversion is consistent - fewer values in $in consistently perform worse. Why would this happen in a sharded environment, and how can I fix it?

7
  • 1
    can you show the execution plan: .explain("executionStats") Commented Oct 22 at 18:36
  • @FranckPachot, sure, i have updated it in the problem statement. Please see. Commented Oct 23 at 1:45
  • AFAIK, try to avoid using hint as the MongoDB query optimiser will choose the best index for querying performance. Unless you confirmed that your sessionId_1_timestamp_-1 index is the best index to perform such query. blog.thnkandgrow.com/… Commented Oct 23 at 1:45
  • 1
    @YongShun, yes, i have already tried that but no luck. Commented Oct 23 at 1:47
  • That explain indicates 34s out of 43s runtime was spent loading docs. Seems slow for 865k docs totalling 400MB Commented Oct 23 at 2:25

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.