3

In one of our applications, we need to hold some plain tabular data and we need to be able to perform user-side autocompletion on one of the columns.

The initial solution we came up with, was to couple MySQL with Solr to achieve this (MySQL to hold data and Solr to just hold the tokenized column and return ids as result). But something unpleasant happened recently (developers started storing some of the data in Solr, because the MySQL table and the operations done on it are nothing that Solr can not provide) and we thought maybe we could merge them together and eliminate one of the two.

So we had to either: (1) move all the data to Solr (2) use MySQL for autocompletion

(1) sounded terrible so I gave it a shot with (2), I started with loading that single column's data into MySQL, disabled all caches on both MySQL and Solr, wrote a tiny webapp that is able to perform very similar queries [1] on both databases, and fired up a few JMeter scenarios against both in a local and similar environment. The results show a 2.5-3.5x advantage for Solr, however, I think the results may be totally wrong and fault prone.

So, what would you suggest for:

  1. Correctly benchmarking these two systems, I believe you need to provide similar[to MySQL] environment to JVM.
  2. Designing this system.

Thanks for any leads.

[1] SELECT column FROM table WHERE column LIKE 'USER-INPUT%' on MySQL and column:"USER-INPUT" on Solr.

1 Answer 1

15

I recently moved a website over from getting its data from the database (postgres) to getting all data from Solr. Unbelievable difference in speed. We also have autocomplete for Australian suburbs (about 15K of them) and it finds them in a couple of milliseconds, so the ajax auto-complete (we used jQuery) reacts almost instantly.

All updates are done against the original database, but our site is a mostly-read site. We used triggers to fire events when records were updated and that spawns a reindex into Solr of the record.

The other big speed improvement was pre-caching data required to render the items - ie we denormalize data and pre-calculate lots of stuff at Solr indexing time so the rendering is easy for the web guys and super fast.

Another advantage is that we can put our site into read-only mode if the database needs to be taken offline for some reason - we just fall back to Solr. At least the site doesn't go fully down.

I would recommend using Solr as much as possible, for both speed and scalability.

Sign up to request clarification or add additional context in comments.

3 Comments

That sounds nice. I'm fairly new to Solr, can you do JOINs and GROUP BYs on tables? Our data in this table is in the order of 5 million records. Would you still suggest usage of Solr? Thanks!
Solr has a loose schema, so everything is stored in a SINGLE table. Each solr core(index) corresponds to a database, not a table. As for "GROUP BY" please look up faceting. wiki.apache.org/solr/SolrFacetingOverview .
@parsa solr is fine to hold huge datasets. There is no concept of a join. Essentially, you index solr "documents" (just a bunch of field/value pairs), so denormalize as much as you like so everything you need to render is saved in the document. Don't worry about the documents being too large or repeating data - it's all about speed!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.