Solr vs. MySQL performance for autocomplete

Question

In one of our applications, we need to hold some plain tabular data and we need to be able to perform user-side autocompletion on one of the columns.

The initial solution we came up with, was to couple MySQL with Solr to achieve this (MySQL to hold data and Solr to just hold the tokenized column and return ids as result). But something unpleasant happened recently (developers started storing some of the data in Solr, because the MySQL table and the operations done on it are nothing that Solr can not provide) and we thought maybe we could merge them together and eliminate one of the two.

So we had to either: (1) move all the data to Solr (2) use MySQL for autocompletion

(1) sounded terrible so I gave it a shot with (2), I started with loading that single column's data into MySQL, disabled all caches on both MySQL and Solr, wrote a tiny webapp that is able to perform very similar queries [1] on both databases, and fired up a few JMeter scenarios against both in a local and similar environment. The results show a 2.5-3.5x advantage for Solr, however, I think the results may be totally wrong and fault prone.

So, what would you suggest for:

Correctly benchmarking these two systems, I believe you need to provide similar[to MySQL] environment to JVM.
Designing this system.

Thanks for any leads.

[1] SELECT column FROM table WHERE column LIKE 'USER-INPUT%' on MySQL and column:"USER-INPUT" on Solr.

Bohemian · Accepted Answer · 2011-09-05 12:28:55Z

15

I recently moved a website over from getting its data from the database (postgres) to getting all data from Solr. Unbelievable difference in speed. We also have autocomplete for Australian suburbs (about 15K of them) and it finds them in a couple of milliseconds, so the ajax auto-complete (we used jQuery) reacts almost instantly.

All updates are done against the original database, but our site is a mostly-read site. We used triggers to fire events when records were updated and that spawns a reindex into Solr of the record.

The other big speed improvement was pre-caching data required to render the items - ie we denormalize data and pre-calculate lots of stuff at Solr indexing time so the rendering is easy for the web guys and super fast.

Another advantage is that we can put our site into read-only mode if the database needs to be taken offline for some reason - we just fall back to Solr. At least the site doesn't go fully down.

I would recommend using Solr as much as possible, for both speed and scalability.

answered Sep 5, 2011 at 12:28

Bohemian♦

427k103 gold badges603 silver badges750 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

parsa Over a year ago

That sounds nice. I'm fairly new to Solr, can you do JOINs and GROUP BYs on tables? Our data in this table is in the order of 5 million records. Would you still suggest usage of Solr? Thanks!

Jesvin Jose Over a year ago

Solr has a loose schema, so everything is stored in a SINGLE table. Each solr core(index) corresponds to a database, not a table. As for "GROUP BY" please look up faceting. wiki.apache.org/solr/SolrFacetingOverview .

Bohemian Over a year ago

@parsa solr is fine to hold huge datasets. There is no concept of a join. Essentially, you index solr "documents" (just a bunch of field/value pairs), so denormalize as much as you like so everything you need to render is saved in the document. Don't worry about the documents being too large or repeating data - it's all about speed!

Collectives™ on Stack Overflow

Solr vs. MySQL performance for autocomplete

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related