0

I have a collection orders

{
    "_id": "abcd",
    "last_modified": ISODate("2016-01-01T00:00:00Z"),
    "suborders": [
        {
            "suborder_id": "1",
            "last_modified: ISODate("2016-01-02T00: 00: 00Z")
        },  {
            "suborder_id":"2",
            "last_modified: ISODate("2016-01-03T00:00:00Z")
        }
    ]
}

I have two indexes on this collection: {"last_modified":1} {"suborders.last_modified": 1}

when I use range queries on last_modified, indexes are properly used, and results are returned instantly. eg query: db.orders.find({"last_modified":{$gt:ISODate("2016-09-15"), $lt:ISODate("2016-09-16")}});

However, when I am querying on suborders.last_modified, the query takes too long to execute. eq query:db.orders.find({"suborders.last_modified":{$gt:ISODate("2016-09-15"), $lt:ISODate("2016-09-16")}});

Please help debug this.

1 Answer 1

1

The short answer is to use min and max to set the index bounds correctly. For how to approach debugging, read on.

A good place to start for query performance issues is to attach .explain() at the end of your queries. I made a script to generate documents like yours and execute the queries you provided.

I used mongo 3.2.9 and both queries do use the created indices with this setup. However, the second query was returning many more documents (approximately 6% of all the documents in the collection). I suspect that is not your intention.

To see what is happening lets consider a small example in the mongo shell:

> db.arrayFun.insert({
    orders: [
        { last_modified: ISODate("2015-01-01T00:00:00Z") },
        { last_modified: ISODate("2016-01-01T00:00:00Z") }
    ]
})
WriteResult({ "nInserted" : 1 })

then query between May and July of 2015:

> db.arrayFun.find({"orders.last_modified": {
    $gt: ISODate("2015-05-01T00:00:00Z"),  
    $lt: ISODate("2015-07-01T00:00:00Z")
}}, {_id: 0})
{ "orders" : [ { "last_modified" : ISODate("2015-01-01T00:00:00Z") }, { "last_modified" : ISODate("2016-01-01T00:00:00Z") } ] }

Although neither object in the array has last_modified between May and July, it found the document. This is because it is looking for one object in the array with last_modified greater than May and one object with last_modified less than July. These queries cannot intersect multikey index bounds, which happens in your case. You can see this in the indexBounds field of explain("allPlansExecution") output, in particular one of the lower bound or upper bound Date will not be what you specified. This means that a large number of documents may need to be scanned to complete the query depending on your data.

To find objects in the array that have last_modified between two bounds, I tried using $elemMatch.

db.orders.find({"suborders": {
    $elemMatch:{ 
        last_modified:{
            "$gt":ISODate("2016-09-15T00:00:00Z"), 
            "$lt":ISODate("2016-09-16T00:00:00Z")
         }
     }
}})

In my test this returned about 0.5% of all documents. However, it was still running slow. The explain output showed it was still not setting the index bounds correctly (only using one bound).

What ended up working best was to manually set the index bounds with min and max.

db.subDocs.find()
    .min({"suborders.last_modified":ISODate("2016-09-15T00:00:00Z")})
    .max({"suborders.last_modified":ISODate("2016-09-16T00:00:00Z")})

Which returned the same documents as $elemMatch but used both bounds on the index. It ran in 0.021s versus 2-4s for elemMatch and the original find.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.