9

I have a SQL-based application and I like to cache the result using Redis. You can think of the application as an address book with multiple SQL tables. The application performs the following tasks:

40% of the time:

  • Create a new record / Update an existing record
  • Bulk update multiple records
  • Review an existing record

60% of the time:

  • Search records based on user's criteria

This is my current approach:

  • The system cache a record when a record is created or updated.
  • When user performs a search, the system will cache the query result.

On top of that, I have a Redis look-up table (Redis Set) which stores the MySQL record ID and the Redis cache key. That way I can delete the Redis caches if the MySQL record has been changed (e.g., bulk update).

What if a new record is created after the system cache the search result? If the new record matches the search criteria, the system will always return the old cache (which does not include the new record), until the cache is deleted (which won't happen until an existing record in the cache is updated).

The search is driven by the users and the combination of the search condition is countless. It is not possible to evaluate which cache should be deleted when a new record is created.

So far, the only solution is to remove all caches of a MySQL table when a record is created. However this is not a good choice because lots of records are created daily.

In this situation, what's the best way to implement Redis on top of MySQL?

2
  • I think it is better to identify your most frequent queries and start working on those. The description you gave is too abstract, if you add more details on the data structure and queries then I can try to help you create some redis structures that will help you give a very fast answer to your searches. Commented Aug 31, 2015 at 19:41
  • I wish I can give you an answer. Unfortunately, the search query all depends on user preferences and the combination is countless. I ended up performing the caching at the page level, i.e., no same SQL is run twice for each page load. Commented Sep 1, 2015 at 5:30

2 Answers 2

10

Here's a surprising thing when it comes to PHP and MySQL (I am not sure about other languages) - not caching stuff into memcached or Redis is actually faster. Much faster. Basically, if you just built your app and queried MySQL - you'd get more out of it.

Now for the "why" part.

InnoDB, the default engine, is a superb engine. Specifically, it's memory management (allocation and what not) is superior to any memory storage solutions. That's a fact, you can look it up or take my word for it - it will, at least, perform as good as Redis.

Now what happens in your app - you query MySQL and cache the result into redis. However, MySQL is also smart enough to keep cached results. What you just did is create an additional file descriptor that's required to connect to Redis. You also used some storage (RAM) to cache the result that MySQL already cached.

Here comes another interesting part - the preferred way of serving PHP scripts is by using php-fpm - it's much quicker than any mod_* crap out there. Down to the core, php-fpm is a supervisor process that spawns child processes. They don't shut down after the script is served, which means they cache connections to MySQL - connect once, use multiple times. Basically, if you serve scripts using php-fpm, they will reuse the already established connection to MySQL, meaning that you won't be opening and closing connections for each request - this is extremely resource friendly and it lets you have lightning fast connection to MySQL. MySQL, being memory efficient and having the cached result is much quicker than Redis.

Now what does all of this mean for you - having a proper setup lets you have small code that's simple, easy, doesn't involve Redis and eliminates all the problems that you might have with cache invalidation and what not and you won't waste your memory to contain the same data twice.

Ingredients you need for this to work:

  • php-fpm
  • MySQL and InnoDB based tables and most of all - sufficient RAM and tweaked innodb_buffer_pool_size variable. That one controls how much RAM InnoDB is allowed to allocate for its purposes - the larger the better.

You eliminated Redis from the game, you kept your code simple and easy to maintain, you didn't duplicate data, you didn't introduce additional system to the play and you let software that's meant to take care of data do its job. Pretty cheap trade-off for maximum usefulness, even if you compile all the software from scratch - it won't take more than an hour or so to get it up and running.

Or, you can just ignore what I wrote and look for a solution using Redis.

Sign up to request clarification or add additional context in comments.

10 Comments

Thank you for your suggestions. We've done all kind of optimizations such as MySQL Indexing, Apache config etc. However, when talking about complicated search from a table with millions of records, it is better to cache them. In fact, we found that the overall time has reduced from 5 seconds to 1 second after the cache is implemented.
Millions of records is a drop in the sea, I work with significantly larger databases, and the ones that span 50 - 100m records are searchable within milliseconds if you set up software properly. The fact it takes 5 seconds is a good indication your MySQL is most likely running default configuration - that means it has only 8MB of ram allocated to it. Naturally, redis will outperform that. Do try my suggestions, tweak your MySQL. There is no chance that doing same work twice is faster than doing it once - and doing it twice is what happens if you use redis.
Thanks for your comment. We already tweaked the MySQL database settings and upgraded the hardware. What I meant (5 seconds) was the total time for the system to prepare the data, which includes running hundreds of SQL queries to multiple tables. The main reason why we want to cache the SQL result because it is the bottle neck. If it is implemented correctly (i.e., purging the cache at the right time), it will dramatically improve the overall performance and save us lots of money on hardware.
So you are actually suggesting that scaling a Mysql db horizontally is as easy as scaling redis and that it makes more sense to add mysql nodes in a scenario where your db is draining instead of adding a caching layer to take the heat of the actual db engine. Does this also work for cases where your db is TB sized? Is it even possible to painlessly add cheap nodes of your db? Imaging an OSM postgres scenario where building the indexes can take up to weeks even with extremely expensive hardware? I don't think I am the one who stands behind an unjustified preference here :)
I am really sorry to have made you so upset. Honestly I am not trying to drag you anywhere. You seem to be to overheated on the subject to actually argue about it. I just genuinely think that your answer lacks credibility. If you think that my comment on the other hand lacks credibility you can just live with it. Sorry again. didn't mean to provoke you or anything.
|
0

We met the same problem and we chose to do same thing you are thinking of: remove all query caches affected by the table. It is not ideal like your said but fortunately our "write" is not as high as 40% so it's ok so far. That's the nature of query based caching. As an alternative you can add entity based caching. Instead of caching the search result only, cache the entire table and do the search inside memory. We use C# LINQ so we can do pretty common queries in memory but if the search is too complicated then you are out of luck.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.