Optimize write performance for AWS Aurora instance

Question

I've got an AWS Aurora DB cluster running that is 99.9% focused on writes. At it's peak, it will be running 2-3k writes/sec.

I know Aurora is somewhat optimized by default for writes, but I wanted to ask as a relative newcomer to AWS - what are some best practices/tips for write performance with Aurora?

This is not a question about programming. It's probably more appropriate to ask at dba.stackexchange.com, not stackoverflow.com. I've voted to move the question to the dba site. — Bill Karwin
– Bill Karwin, Commented Sep 23, 2017 at 20:51

Bill Karwin · Accepted Answer · 2021-04-27 17:25:01Z

60

From my experience, Amazon Aurora is unsuited to running a database with heavy write traffic. At least in its implementation circa 2017. Maybe it'll improve over time.

I worked on some benchmarks for a write-heavy application earlier in 2017, and we found that RDS (non-Aurora) was far superior to Aurora on write performance, given our application and database. Basically, Aurora was two orders of magnitude slower than RDS. Amazon's claims of high performance for Aurora are apparently completely marketing-driven bullshit.

In November 2016, I attended the Amazon re:Invent conference in Las Vegas. I tried to find a knowledgeable Aurora engineer to answer my questions about performance. All I could find were junior engineers who had been ordered to repeat the claim that Aurora is magically 5-10x faster than MySQL.

In April 2017, I attended the Percona Live conference and saw a presentation about how to develop an Aurora-like distributed storage architecture using standard MySQL with CEPH for an open-source distributed storage layer. There's a webinar on the same topic here: https://www.percona.com/resources/webinars/mysql-and-ceph, co-presented by Yves Trudeau, the engineer I saw speak at the conference.

What became clear about using MySQL with CEPH is that the engineers had to disable the MySQL change buffer because there's no way to cache changes to secondary indexes, while also have the storage distributed. This caused huge performance problems for writes to tables that have secondary (non-unique) indexes.

This was consistent with the performance problems we saw in benchmarking our application with Aurora. Our database had a lot of secondary indexes.

So if you absolutely have to use Aurora for a database that has high write traffic, I recommend the first thing you must do is drop all your secondary indexes.

Obviously, this is a problem if the indexes are needed to optimize some of your queries. Both SELECT queries of course, but also some UPDATE and DELETE queries may use secondary indexes.

One strategy might be to make a non-Aurora read replica of your Aurora cluster, and create the secondary indexes only in the read replica to support your SELECT queries. I've never done this, but apparently it's possible, according to https://aws.amazon.com/premiumsupport/knowledge-center/enable-binary-logging-aurora/

But this still doesn't help cases where your UPDATE/DELETE statements need secondary indexes. I don't have any suggestion for that scenario. You might be out of luck.

My conclusion is that I wouldn't choose to use Aurora for a write-heavy application. Maybe that will change in the future.

Update April 2021:

Since writing the above, I have run sysbench benchmarks against Aurora version 2. I can't share the specific numbers, but I conclude that current Aurora improvements are better for write-heavy workload. I did run tests with lots of secondary indexes to make sure. But I encourage anyone serious about adopting Aurora to run their own benchmarks.

At least, Aurora is much better than conventional Amazon RDS for MySQL using EBS storage. That's probably where they claim Aurora is 5x faster than MySQL. But Aurora is no faster than some other alternatives I tested, and in fact cannot match:

MySQL Server installed myself on EC2 instances using local storage, especially i3 instances with locally-attached NVMe. I understand instance storage is not dependable, so one would need to run redundant nodes.
MySQL Server installed myself on physical hosts in our data center, using direct-attached SSD storage.

The value of using Aurora as a managed cloud database is not just about performance. It also has automated monitoring, backups, failover, upgrades, etc.

edited Apr 27, 2021 at 17:25

answered Sep 23, 2017 at 20:51

Bill Karwin

567k87 gold badges709 silver badges869 bronze badges

Sign up to request clarification or add additional context in comments.

11 Comments

griffinjt Over a year ago

Thanks for your insight. All of the queries are offloaded and done on a Redshift cluster, so dropping secondary indexes shouldn't be an issue at all since the DB isn't touched for general data analytics. I'd not heard of this issue before but I will give this a shot and see if it makes any difference.

griffinjt Over a year ago

Wow, I can confirm this to be the case. Dropping secondary indexes reduced CPU usage almost in half. Seems like this would be something they need to address.

Fernando Piancastelli Over a year ago

I'm sorry I can only upvote you once. This is exactly the real use-case experience I was trying to read about, because I am (was) considering migrating a similar database to Aurora, and I had to find out if it would help a write-heavy application with a LOT of indexes.

Bill Karwin Over a year ago

@IkerAguayo, It was several years ago, but I recall the app I was working on had about 80:1 ratio of writes versus reads. That's very unusual. Most apps have the opposite ratio, where reads are much more common than writes. I would consider an app write-heavy even if it was a 1:1 ratio of writes versus reads, because even that would be much more writes than a typical app.

Bill Karwin Over a year ago

@Juliano Thanks for sharing your experience with AWS. I've done other benchmarks since 2017, so I updated my answer above.

|

dz902 · Accepted Answer · 2020-10-14 03:57:23Z

For Googlers:

Aurora needs to write to multiple replicas in real time, thus there must be a queue w/ locking, waiting, checking mechanisms
This behavior inevitably causes ultra high CPU utilization and lag when there are continuous writing requests which only succeed when multiple replicas are sync'd
This has been around since Aurora's inception, up til 2020, which is logically difficult if not impossible to solve if we were to keep the low storage cost and fair compute cost of the service
High-volume writing performance of Aurora MySQL could be more than 10x worse than RDS MySQL (from personal experience and confirmed by above answers)

To solve the problem (more like a work-around):

BE CAREFUL with Aurora if more than 5% of your workload is writing
BE CAREFUL with Aurora if you need near real-time result of large volume writing
Drop secondary indices as @Bill Karwin points out to improve writing
Batch apply inserts and updates may improve writing

I said "BE CAREFUL" but not "DO NOT USE" as many scenarios could be solved by clever architecture design. Database writing performance can be hardly depended on.

Chris Zelenak · Accepted Answer · 2018-05-01 17:50:36Z

I had a relatively positive experience w/ Aurora, for my use case. I believe ( time has passed ) we were pushing somewhere close to 20k DML per second, largest instance type ( I think db.r3.8xlarge? ). Apologies for vagueness, I no longer have the ability to get the metrics for that particular system.

What we did:

This system did not require "immediate" response to a given insert, so writes were enqueued to a separate process. This process would collect N queries, and split them into M batches, where each batch correlated w/ a target table. Those batches would be put inside a single txn.

We did this to achieve the write efficiency from bulk writes, and to avoid cross table locking. There were 4 separate ( I believe? ) processes doing this dequeue and write behavior.

Due to this high write load, we absolutely had to push all reads to a read replica, as the primary generally sat at 50-60% CPU. We vetted this arch in advance by simply creating random data writer processes, and modeled the general system behavior before we committed the actual application to it.

The writes were almost all INSERT ON DUPLICATE KEY UPDATE writes, and the tables had a number of secondary indexes.

I suspect this approach worked for us simply because we were able to tolerate delay between when information appeared in the system, and when readers would actually need it, thus allowing us to batch at much higher amounts. YMMV.

Collectives™ on Stack Overflow

Optimize write performance for AWS Aurora instance

3 Answers 3

11 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

11 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related