0

I have the following document:

{
    'date': date,
    '_id': ObjectId,
    'Log': [
        {
            'lat': float,
            'lng': float,
            'date': float,
            'speed': float,
            'heading': float,
            'fix': float
        }
    ]
}

for 1 document, the Log array can be some hundred entries.

I need to query the first and last date element of Log on each document. I know how to query it, but I need to do it fast, so I would like to build an index for that. I don't want to index Log.date since it is too big... how can I index them?

3
  • This seems somewhat unclear to me what you want and if it is related to indexing at all. Are you saying you need to find the first (minimum) and last (maximum) date within that array? And since it is an array are they possibly inserted in order, or even could do with being ordered that way? Commented Feb 28, 2014 at 0:33
  • I updated my question. I know how to index it, but I would like to access it very fast, so I want to index it... and I don't know how Commented Feb 28, 2014 at 14:26
  • Then how is the answer that I gave not very fast? Please read and see what matches your conditions. Commented Feb 28, 2014 at 14:48

2 Answers 2

1

In fact it's hard to advise without knowing how you work with the documents. One of the solutions could be to use a sparse index. You just need to add a new field to every first and last array element, let's call it shouldIndex. Then just create a sparse index which includes shouldIndex and date fields. Here's a short example:

Assume we have this document

{"Log": 
    [{'lat': 1, 'lng': 2, 'date': new Date(), shouldIndex : true}, 
    {'lat': 3, 'lng': 4, 'date': new Date()}, 
    {'lat': 5, 'lng': 6, 'date': new Date()}, 
    {'lat': 7, 'lng': 8, 'date': new Date(), shouldIndex : true}]}

Please note the first element and the last one contain shouldIndex field.

db.testSparseIndex.ensureIndex( { "Log.shouldIndex": 1, "Log.date":1 }, { spar
se: true } )

This index should contain entries only for your first and last elements.

Alternatively you may store first and last elements date field in a seperate array.

For more info on sparse indexes please refer to this article.

Hope it helps!

Sign up to request clarification or add additional context in comments.

2 Comments

Worth noting here that if this indeed a reflection of what the data does look like, then in order to tag in such a way you would need to find the first and last dates before you could even apply a sparse index. Which kind of leads me to my point.
Yes, thanks. I do know how to access the first and last elements. This was not the scope of my question. I will try it out. Thanks.
1

So there was an answer about indexing that is fundamentally correct. As of writing though it seems a little unclear whether you are talking about indexing at all. It almost seems like what you want to do is get the first and last date from the elements in your array.

With that in mind there are a few approaches:

1. The elements in your array have been naturally inserted in increasing date values

So if the way all writes that are made to this field is done, only with use of the $push operator over a period of time, and you never update these items, at least in so much as changing a date, then your items are already in order.

What this means is you just get the first and last element from the array

db.collection.find({ _id: id },{ Log: {$slice: 1 }});    // gets the first element
db.collection.find({ _id: id },{ Log: {$slice: -1 }});   // gets the last element

Now of course that is two queries but it's a relatively simple operation and not costly.

2. For some reason your elements are not naturally ordered by date

If this is the case, or indeed if you just can't live with the two query form, then you can get the first and last values in aggregation, but using $min and $max modifiers

db.collection.aggregate([

    // You might want to match first. Just doing one _id here. (commented)
    //{"$match": { "_id": id }},

    //Unwind the array
    {"$unwind": "$Log" },

    //
    {"$group": { 
        "_id": "$_id",
        "firstDate": {"$min": "$Log.Date" },
        "lastDate": {"$max": "$Log.Date" }
    }}

])

So finally, if your use case here is getting the details of the documents that have the first and last date, we can do that as well, mirroring the initial two query form, somewhat. Using $first and $last :

db.collection.aggregate([

    // You might want to match first. Just doing one _id here. (commented)
    //{"$match": { "_id": id }},

    //Unwind the array
    {"$unwind": "$Log" },

    // Sort the results on the date
    {"$sort": { "_id._id": 1, "Log.date": 1 }},

    // Group using $first and $last
    {"$group": { 
        "_id": "$_id",
        "firstLog": {"$first": "$Log" },
        "lastLog": {"$last": "$Log" }
    }}

])

Your mileage may vary, but those approaches may obviate the need to index if this indeed would the the only usage for that index.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.