2

We are encountering this exception very often in our production code without any increase in number of requests to Couchbase or any memory pressure on the server itself. The node has been allocated 30GB of RAM and the usage is of 3GB maximum but every now and then this exception is being thrown. The bucket is opened only once per application lifetime and only get and upsert operations are performed afterwards. The connection is initialised like this:

Config = new ClientConfiguration()
{
    Servers = serverList,

    UseSsl = false,
    DefaultOperationLifespan = 2500,
    BucketConfigs = new Dictionary<string, BucketConfiguration>
    {
        { bucketName, new BucketConfiguration
        {
            BucketName = bucketName,
            UseSsl = false,
            DefaultOperationLifespan = 2500,
            PoolConfiguration = new PoolConfiguration
            {
            MaxSize = 2000,
            MinSize = 200,
            SendTimeout = (int)Configuration.Config.Instance.CouchbaseConfig.Timeout
            }
    }}
    }
};

Cluster = new Cluster(Config);
Bucket = Cluster.OpenBucket();

Can you please let me know if this initialisation is correct and more importantly what to check on the Couchbase server to find the cause of this issue? I have checked all logs on the server but could not find anything special at the time when those errors are being thrown.

Thank you,

Stacktrace:

System.Exception.Couchbase exception
at ###.DataLayer.Couchbase.CouchbaseUserOperations.Get()
at ###.API.Services.BaseService`1.SetUserID()
at ###.API.Services.EventsService+<GetResponse>d__0.MoveNext()
at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start()
at ###.API.Services.EventsService.GetResponse()
at ###.API.Services.BaseService`1+<Any>d__28.MoveNext()
at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start()
at ###.API.Services.BaseService`1.Any()
at lambda_method()
at ServiceStack.Host.ServiceRunner`1.Execute()
at ServiceStack.Host.ServiceRunner`1.Process()
at ServiceStack.Host.ServiceExec`1.Execute()
at ServiceStack.Host.ServiceRequestExec`2.Execute()
at ServiceStack.Host.ServiceController.ManagedServiceExec()
at ServiceStack.Host.ServiceController+<>c__DisplayClass11.<RegisterServiceExecutor>b__f()
at ServiceStack.Host.ServiceController.Execute()
at ServiceStack.HostContext.ExecuteService()
at ServiceStack.Host.RestHandler.ProcessRequestAsync()
at ServiceStack.Host.Handlers.HttpAsyncTaskHandler.System.Web.IHttpAsyncHandler.BeginProcessRequest()
at System.Web.HttpApplication+CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
at System.Web.HttpApplication.ExecuteStep()
at System.Web.HttpApplication+PipelineStepManager.ResumeSteps()
at System.Web.HttpApplication.BeginProcessRequestNotification()
at System.Web.HttpRuntime.ProcessRequestNotificationPrivate()
at System.Web.Hosting.PipelineRuntime.ProcessRequestNotificationHelper()
at System.Web.Hosting.PipelineRuntime.ProcessRequestNotification()
at System.Web.Hosting.UnsafeIISMethods.MgdIndicateCompletion()
at System.Web.Hosting.UnsafeIISMethods.MgdIndicateCompletion()
at System.Web.Hosting.PipelineRuntime.ProcessRequestNotificationHelper()
at System.Web.Hosting.PipelineRuntime.ProcessRequestNotification()
Caused by: System.Exception : Couchbase.Core.NodeUnavailableException: The node 172.31.34.105:11210 that the key was mapped to is either down or unreachable. The SDK will continue to try to connect every 1000ms. Until it can connect every operation routed to it will fail with this exception.
at ###.DataLayer.Couchbase.CouchbaseUserOperations.Get()
at ###.API.Services.BaseService`1.SetUserID()
at ###.API.Services.EventsService+<GetResponse>d__0.MoveNext()
at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start()
at ###.API.Services.EventsService.GetResponse()
at ###.API.Services.BaseService`1+<Any>d__28.MoveNext()
at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start()
at ###.API.Services.BaseService`1.Any()
at lambda_method()
at ServiceStack.Host.ServiceRunner`1.Execute()
at ServiceStack.Host.ServiceRunner`1.Process()
at ServiceStack.Host.ServiceExec`1.Execute()
at ServiceStack.Host.ServiceRequestExec`2.Execute()
at ServiceStack.Host.ServiceController.ManagedServiceExec()
at ServiceStack.Host.ServiceController+<>c__DisplayClass11.<RegisterServiceExecutor>b__f()
at ServiceStack.Host.ServiceController.Execute()
at ServiceStack.HostContext.ExecuteService()
at ServiceStack.Host.RestHandler.ProcessRequestAsync()
at ServiceStack.Host.Handlers.HttpAsyncTaskHandler.System.Web.IHttpAsyncHandler.BeginProcessRequest()
at System.Web.HttpApplication+CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
at System.Web.HttpApplication.ExecuteStep()
at System.Web.HttpApplication+PipelineStepManager.ResumeSteps()
at System.Web.HttpApplication.BeginProcessRequestNotification()
at System.Web.HttpRuntime.ProcessRequestNotificationPrivate()
at System.Web.Hosting.PipelineRuntime.ProcessRequestNotificationHelper()
at System.Web.Hosting.PipelineRuntime.ProcessRequestNotification()
at System.Web.Hosting.UnsafeIISMethods.MgdIndicateCompletion()
at System.Web.Hosting.UnsafeIISMethods.MgdIndicateCompletion()
at System.Web.Hosting.PipelineRuntime.ProcessRequestNotificationHelper()
at System.Web.Hosting.PipelineRuntime.ProcessRequestNotification()
4
  • Do you have a stack-trace? Commented Jul 12, 2015 at 8:25
  • Hi @rene. I have updated the question now with the stacktrace. Thank you Commented Jul 12, 2015 at 8:29
  • 1
    I'm not a couchbase user but I expect you need to look into network connectivity, so not much is wrong with your client-side code or server-side setup, instead one of the networkcomponents between your client and server is rejecting connections temporarily. Commented Jul 12, 2015 at 8:35
  • Hi @rene. The server is hosted on a large instance on AWS with very high network throughput so I don't see how this might be an issue and the average ops/sec is only around 300. Commented Jul 13, 2015 at 17:28

2 Answers 2

2

A NodeUnavailableException could be returned for any number of network related issues...However, since you mentioned you are running on AWS, it's likely the TCP keep-alives settings needs to be tuned on the client.

Your MinSize connections (200) is so large, that you are not likely using them all and they are sitting by idly until the AWS LB decides to shut them down. When this happens the SDK will temporarily put the node (1000ms) that failed into a down state and then try to reconnect. During this time any keys mapped to it will fail with that exception.

This blog describes how to set the TCP keep-alives time and interval: http://blog.couchbase.com/introducing-couchbase-.net-sdk-2.1.0-the-asynchronous-couchbase-.net-client

var config = new ClientConfiguration
{
    EnableTcpKeepAlives = true, //default it true
    TcpKeepAliveTime = 1000*60*60, //set to 60mins
    TcpKeepAliveInterval = 5000 //KEEP ALIVE will be sent every 5 seconds  after 1hr
};
var cluster = new Cluster(config);
var bucket = cluster.OpenBucket();

That assumes you are using version 2.1.0 or greater of the client. If you are not, you can do it through the ServicePointManager:

//setting keep-alive time to 200 seconds
ServicePointManager.SetTcpKeepAlive(true, 200000, 1000); 

You'll have to set that that to a value less than what the AWS LB is set to (I believe it's 60 seconds).

You should also probably set your connection pool min and max a bit lower, like 5 and 10.

Sign up to request clarification or add additional context in comments.

7 Comments

Hello @jeffrymorris. Thank you for your answer but unfortunately your suggested changes did not solve the issue. The couchbase server is not under AWS load balancer so that cannot be the source. We have also decreased the number of connections but still no luck. The couchbase server is installed on an Ubuntu instance. Do you know if we need to modify anything on the OS?
We have monitored the TCP connections and with a minimum of 5 connections and a max of 20, we see only 3 ports opened (dropbox.com/s/fkw0rika8a8wtv1/…). Couple of minutes before there were 4 ports and when that exception occured, one of them disappeared. The main issue is that they also don't respawn and when that occurs we have extremely slow response from the DB. What do you think?
@RaduCotofana - what version of the server are you using? Also, if you enable client side logging (docs.couchbase.com/developer/dotnet-2.1/setting-up-logging.html), you should be able to log the actual exception that was thrown triggering the NodeUnavailableException. The slow response is likely the connection timing out and failing, then rebuilding itself...it takes ~15-20s.
i have found the issue and it relies in couchbase .NET client. The SendTimeout was set to 50ms before and not it's set to 500ms but still issues. The exception thrown is: "The connection has timed out while an operation was in flight". Once this occurs the .NET client marks the connection as dead and does not reopen another connection. The current pool is set to 20 min and 200 max but the app starts with 20 TCP connections opened and every time this exception occurs, the number of connections decreases and never get back to the minimum value. Do you have any knowledge of this?
first, I don't understand why it times out at 500ms and then I don't understand why the connections are not being rebuilt
|
0

Even though the problem was not fully solved since we still encounter timeouts but at a lower rate, we increased the performance by using the ClusterHelper singleton instance as follows:

 ClusterHelper.Initialize(
            new ClientConfiguration
            {
                Servers = serverList,
                UseSsl = false,
                DefaultOperationLifespan = 2500,
                EnableTcpKeepAlives = true,
                TcpKeepAliveTime = 1000*60*60,
                TcpKeepAliveInterval = 5000,
                BucketConfigs = new Dictionary<string, BucketConfiguration>
                {
                    {
                        "default",
                        new BucketConfiguration
                        {
                            BucketName = "default",
                            UseSsl = false,
                            Password = "",
                            PoolConfiguration = new PoolConfiguration
                            {
                                MaxSize = 50,
                                MinSize = 10
                            }
                        }
                    }
                }
            });

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.