3

In e-commerce application I have documents like this:

{ category:'A', ..., price:122,
  attr:{ width:6, height:4, hasLCD:true, lcdType:'some text', ..., a36:null }
}

I.e. every product has many attributes of various simple types.

Now I want to filter products by dynamic queries containing top level fields plus some attributes. For example:

find({category:'A', price:{$lt:200}, ...,
     'attr.height':{$lt:6}, 'attr.hasLCD':true, 'attr.lcdType':{$in:[...]}, ...})

And I'd like this to perform fast.

Trying to index on all possible 'attr.*' variants gives me an error (too many compound keys). I also suspect that if I index it that way and then omit one of attrs in query index won't work.

Trying to index on 'attr' as a whole does not help either.

What is the proper way to model this under MongoDB?

Update

I have tried this approach (also mentioned here). I.e. store attributes as array of key-value pairs:

attr2: [ {tag:'lcgType', value:'some text'}, ...

And index it like this:

ensureIndex({ 'attr2.tag':1, 'attr2.value':1 })

And query like this:

find({attr2:{$all:[
  {$elemMatch:{tag:'bestseller',value:true}},
  {$elemMatch:{tag:'weight',value:{$lte:100}}}
]}})

Now explain() says that it is using "BtreeCursor attr2.tag_1_attr2.value_1" but still "nscanned" : 31607 and the whole execution time have actually increased (compared to non-indexed scenario).

Something is wrong here.

Sub-question

What if I select some (less than 31) most frequently queried attributes and try to index on those. If I put all of them in single compound index:

ensureIndex({'attr.a1':1, 'attr.a2':1, ...})

According to the docs this index won't be used for queries missing attr.a1 attribute.

How to define index in this case?

7
  • Do you have more than 31 fields ? Commented Oct 30, 2013 at 20:10
  • Yes, that's the reason for 'too many compound keys' error. Commented Oct 30, 2013 at 20:57
  • Creating a compound index for both category and price should be sufficient. Especially if you can fit your collection into RAM. Indexing each and every attribute will blow your disk space and eat up all your RAM. There is probably a reasonable limited amount of your documents (Cell phones? Cameras?). How many can you except per category and price range? few dozens probably. Even if you have to hit the disk that is not so bad Commented Oct 30, 2013 at 21:05
  • When querying at shop's top level, i.e. all categories, with some attributes it has to scan all 32 thousand products. It takes more than a second on an idle server. This is not too bad but not too good either. Commented Oct 30, 2013 at 21:30
  • OK. But if you can restrict your query by category and price (range), you can avoid full scan Commented Oct 30, 2013 at 23:13

1 Answer 1

2

If you really have to allow a lot of filters, combinations and possibly even sorts, MongoDB is not a good fit because it uses only one index per query. The number of indexes then grows way too fast, because compound keys are somewhat inflexible (that should answer the subquestion) and becomes a performance hog.

Use a search database like ElasticSearch, SolR, etc. instead that comes with the features you need. You can the use a $in on the ids that the search server returned if you want to keep the base information in MongoDB (it's usually a good idea to have the search database simply replicate the information of the primary data store so you don't need to sync changes two-way, which would be a nightmare)

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.