I'm facing a counter-intuitive performance issue with my MongoDB sharded cluster where queries with fewer values in an $in clause are significantly slower than queries with more values.
The Issue:
Query with 8 values in $in: 300ms
Query with 3 values in $in: 30 seconds (100x slower)
Both queries use the same collection, same index, and same shard key.
Document Structure (example):
{
"userId": "user123",
"timestamp": "2025-01-15T10:30:00.000Z",
"eventType": "page_view",
"sessionId": "session_2025-01-15", // Shard Key
"metadata": {...}
}
Query Code:
// Fast - 300ms
const fast = await collection.find({
sessionId: {$in: ['val1','val2','val3','val4','val5','val6','val7','val8']}
})
.sort({timestamp: -1})
.limit(50)
.hint("sessionId_1_timestamp_-1")
.toArray();
// Slow - 30 seconds
const slow = await collection.find({
sessionId: {$in: ['val1','val2','val3']}
})
.sort({timestamp: -1})
.limit(50)
.hint("sessionId_1_timestamp_-1")
.toArray();
Index: {sessionId: 1, timestamp: -1} (shard key is sessionId)
explainStats:
stages: [
{
stage: '$query',
timeInclusiveMS: 60811.2455,
timeExclusiveMS: 60811.2455,
in: 837525,
out: 837525,
dependency: {
getNextPageCount: 31,
count: 30,
time: 0,
bytes: 259495372
},
details: {
database: 'test-db',
collection: 'session-events',
query: {
sessionId: {
'$in': [
'session_2025-01-15',
'session_2025-01-16'
]
}
},
indexUsage: {
pathsIndexed: {
individualIndexes: [],
compoundIndexes: [
{
sessionId: 1,
timestamp: -1
}
]
},
pathsNotIndexed: {
individualIndexes: [
'sessionId'
],
compoundIndexes: []
}
},
sort: {
timestamp: -1
},
shardInformation: [
{
activityId: '<empty>',
shardKeyRangeId: '[,15555555555555555555555555555555) move next',
durationMS: 2648.2615,
preemptions: 0,
outputDocumentCount: 27932,
retrievedDocumentCount: 27934
},
...
],
queryMetrics: {
retrievedDocumentCount: 865467,
retrievedDocumentSizeBytes: 408356437,
outputDocumentCount: 865457,
outputDocumentSizeBytes: 325033343,
indexHitRatio: 1,
totalQueryExecutionTimeMS: 43843.4998,
queryPreparationTimes: {
queryCompilationTimeMS: 3.16,
logicalPlanBuildTimeMS: 1.57,
physicalPlanBuildTimeMS: 4.79,
queryOptimizationTimeMS: 0.03
},
indexLookupTimeMS: 1575.1799,
documentLoadTimeMS: 34403.5198,
vmExecutionTimeMS: 43132.1599,
runtimeExecutionTimes: {
queryEngineExecutionTimeMS: 2853.9703,
systemFunctionExecutionTimeMS: 61.66,
userDefinedFunctionExecutionTimeMS: 0
},
documentWriteTimeMS: 4299.4899
}
}
}
],
estimatedDelayFromRateLimitingInMilliseconds: 0,
The performance inversion is consistent - fewer values in $in consistently perform worse. Why would this happen in a sharded environment, and how can I fix it?
hintas the MongoDB query optimiser will choose the best index for querying performance. Unless you confirmed that yoursessionId_1_timestamp_-1index is the best index to perform such query. blog.thnkandgrow.com/…