Aerospike client not adhering to connectionQueueSize

Question

We have a Go application running in a highly concurrent environment. We do several aerospike reads for every single request, by spawning multiple goroutines . Currently we are facing this issue where our clients are exceeding connectionQueueSize limit intermittently. Here is the client policy:

      "ConnectionTimeoutMs": 1000,      
      "SocketTimeoutMs": 50,
      "TimeoutMs": 100,
      "ConnectionQueueSize": 150,
      "LimitConnectionsToQueueSize": true

We tried increasing/descresing connectionQueueSize, using default values, but same was the result.

Recently we made the changes to make the client singleton. Also, we tried to set the MaxRetries to 0, in case that's trying to pool connections over multiple attempts. After setting that to 0, I am getting following error:

command execution timed out on client: Exceeded number of retries. See Policy.MaxRetries. why it says exceeded no of retries, when maxretries is set to 0

In summary, none of these have helped. Any pointer is appreciated. Thanks.

Please note: We use Aerospike community version.

Robert Glonek · Accepted Answer · 2024-11-12 21:10:04Z

6

To fully understand the issue, we would need some code snippets of the policy object initialization and setting of the values. First ensure you use use the init functions, and not declare the object with $...{}, for example:

policy := aerospike.NewClientPolicy()
policy.ConnectionQueueSize = 150

If the issue persists, most likely reason is that the client application is trying to read/write faster than it can. Aerospike will use each connection for a transaction. So if you are doing 100 transactions at the same time, 100 connections will be required. If the server has too much load, it will process transactions slower, meaning the client will spend a little longer processing each transaction. This means, to handle the same load, the client would need to open more connections. In general, if increasing the ConnectionQueueSize doesn't resolve the issue, most likely reason is sizing - the servers are reaching some sort of speed limit - be it network or disk bottleneck.

To troubleshoot this, you can either use linux standard tools, like atop, iotop, top, mpstat, iostat or enable microbenchmarks in aerospike: https://aerospike.com/docs/server/operations/monitor/latency

With microbenchmarks, you can then use tools such as aerolab-agi (ex: aerolab agi create --source-local /path/to/aerospike/logs) to check the results in grafana. https://github.com/aerospike/aerolab

I have also noticed you have "LimitConnectionsToQueueSize": true. This will cause the connections to be limited to the queue size - 150 connections max. Setting this to false will allow more connections to be opened, though if the issue is a server/network/disk bottlebeck, setting this to false could cause more adverse effects.

Regarding the maxretries - the error states that maxretries of 0 was reached (no retries) - meaning that the first and only attempt has failed. If you are getting these timeouts without retrying, this further reinforces the bottleneck theory above.

answered Nov 12, 2024 at 21:10

Robert Glonek

712 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

dhamu Over a year ago

So are you saying the issue is with the server node not handling the volume efficiently? I am looking into the goroutines we spawn. Since it's a highly concurrent env, I'd guess that the client is struggling to keep the no of connections under limit when a burst of goroutines try to read from aerospike.

Robert Glonek Over a year ago

If you are spawning goroutines to handle the read/write, you may want to check with a chan counter how many you are spawning. This could indeed in that case be too many goroutines in that case. Just to give you an idea, on my 4-core, 16GiB RAM, I have similar settings to yours but with 1000 connectionQueueLimit and up to 1000 goroutines that I spawn at any given time and the client machine isn't breaking a sweat. I think first point would be for you to check disk/network load on the server and the number of r/w per second. Then possibly increase the QueueLimit if the server and network are ok

Collectives™ on Stack Overflow

Aerospike client not adhering to connectionQueueSize

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related