📝 Body
I have a Mongo collection CollectionA where each top-level object contains a nested array of meetings now each meetings have start and end times, for example:
CollectionA = [
{
"id": 1,
"meetings": [
{"start": "2025-01-01T10:00:00", "end": "2025-01-01T11:00:00"},
{"start": "2025-01-10T09:00:00", "end": "2025-01-10T09:30:00"},
]
},
{
"id": 2,
"meetings": [
{"start": "2025-03-01T14:00:00", "end": "2025-03-01T15:00:00"}
]
},
...
]
I frequently need to filter these objects by a date range — for example:
“Find all objects that have at least one meeting overlapping
[query_start, query_end].”
However, this dataset can be large (thousands of objects, each with dozens of nested intervals), and I also use pagination to load CollectionA gradually.
đź§ Current Problem
Current Approach :-
- Filter the CollectionA based on some top level fields
- Filter the CollectionA further based on nested meetings array
This approach is getting expensive right now. And we are not looking ahead in splitting the whole collection.
Here's my pipeline :-
[
{'$match': {'is_deleted': False, 'seller_company_id': ObjectId('XXX'), 'is_hidden': False
}
},
{'$lookup': {'from': 'Company', 'let': {'companyId': '$seller_company_id'
}, 'pipeline': [
{'$match': {'$expr': {'$eq': ['$_id', '$$companyId'
]
}
}
},
{'$project': {'configuration.crm_stages': 1
}
}
], 'as': 'company'
}
},
{'$unwind': '$company'
},
{'$addFields': {'crm_stage_info': {'$ifNull': [
{'$first': {'$filter': {'input': {'$ifNull': ['$company.configuration.crm_stages',
[]
]
}, 'as': 'stage', 'cond': {'$eq': ['$$stage.name', '$crm_stage.name'
]
}
}
}
}, None
]
}
}
},
{'$addFields': {'crm_stage': {'$cond': [
{'$ne': ['$crm_stage_info', None
]
}, '$crm_stage_info', '$crm_stage'
]
}
}
},
{'$addFields': {'meetings': {'$filter': {'input': {'$ifNull': ['$meetings',
[]
]
}, 'as': 'meeting', 'cond': {'$and': [
{'$ne': ['$$meeting.is_meeting_deleted', True
]
},
{'$and': [
{'$gte': ['$$meeting.start_meet', datetime.datetime(2025,
10,
7,
18,
30, tzinfo=datetime.timezone.utc)
]
},
{'$lte': ['$$meeting.start_meet', datetime.datetime(2025,
10,
15,
18,
29,
59,
999000, tzinfo=datetime.timezone.utc)
]
}
]
}
]
}
}
}
}
},
{'$project': {'_id': 1, 'name': 1, 'is_lead_qualified': 1, 'is_from_calendar': {'$ifNull': ['$is_from_calendar', False
]
}, 'is_hidden': {'$ifNull': ['$is_hidden', False
]
}, 'is_closed': {'$ifNull': ['$is_closed', False
]
}, 'is_closed_won': {'$ifNull': ['$is_closed_won', False
]
}, 'is_closed_lost': {'$ifNull': ['$is_closed_lost', False
]
}, 'updated_on': 1, 'created_on': 1, 'user_id': 1, 'average_sales_score': 1, 'total_sales_score': 1, 'crm_info': 1, 'crm_stage': 1, 'recent_meeting_stage': 1, 'meetings._id': 1, 'meetings.title': 1, 'meetings.start_meet': 1, 'meetings.end_meet': 1, 'meetings.meeting_stage': 1, 'meetings.bot_id': 1, 'meetings.is_completed': 1, 'meetings.is_meet_proper': 1, 'meetings.is_copilot_allowed': 1, 'meetings.crm_push_error': 1, 'meetings.is_crm_data_pushed': 1, 'meetings.crm_push_error_info': 1
}
},
{'$match': {'meetings': {'$ne': []
}
}
},
{'$sort': {'meetings.start_meet': -1
}
},
{'$facet': {'totalCount': [
{'$count': 'count'
}
], 'results': [
{'$skip': 1
},
{'$limit': 50
}
]
}
},
{'$project': {'results': 1, 'total': {'$ifNull': [
{'$arrayElemAt': ['$totalCount.count',
0
]
},
0
]
}
}
}
]
So I need a data structure or indexing strategy that can:
- Precompute something at the top level.
- Allow me to filter objects by a time range.
- Tell me with certainty whether there exists at least one nested interval in that range.
- Work efficiently with pagination (i.e., I can skip irrelevant objects quickly).
đź§© Example
Suppose my query is:
query_start = "2025-01-05T00:00:00"
query_end = "2025-01-06T00:00:00"
That means I’m still checking nested meetings inside this object for no reason — which scales poorly. I want to get all the top level object efficiently that way we can even make our filtering index based
💠What I’ve Considered
- Storing the minimum start time and maximum end time in the top level collection. This will surely shrink the search space but still will give a lot of false positives inside which we will iterate the nested object
🚀 What I’m Looking For
I’m looking for the best data structure, algorithmic approach or better solution that:
- Reduces or eliminates false positives (so I don’t iterate objects unnecessarily).
- Allows quick filtering by time range.
- Works well with pagination (i.e., sequential fetching of matching objects).
- Can be implemented in MongoDB with reasonable preprocessing.
📊 Constraints
- Each object can have 10–200 nested intervals.
- Typical queries are small date ranges (1–3 days).
- Total objects: 11k+
- Each objects are very data heavy
- Performance matters — I’d like to minimize per-query iteration.
- I can afford a preprocessing step to build an index or compressed structure.
đź’¬ Question
What is an efficient way to precompute or index nested time ranges so that:
- I can quickly find top-level objects with at least one nested interval overlapping a query range,
- without scanning every nested array,
- and while supporting pagination?
Would appreciate any advice, data structure recommendations (Interval Tree, Segment Tree, compressed range lists, time buckets, etc.), or real-world patterns you’ve used in similar “nested interval query” scenarios.
meetings.startandmeetings.end?{'$match': {'is_deleted': False, 'seller_company_id': ObjectId('XXX'), 'is_hidden': False }. Then you have a lookup and other stages until the 6th stage where you have yourmeeting.start&meeting.endin a$filterexpression! Your documents are now being processed in the pipeline/RAM. The index is for the initial disk fetch.