0

I'm experiencing a Connection reset by peer exception in my Spring Boot 3.4.4 application that uses Spring Cloud Gateway with WebFlux to serve a web application. The entire stack is deployed in a Kubernetes cluster using Helm charts with multiple interconnected services.

Kubernetes Architecture Overview

  • Cluster: Azure Kubernetes Service (AKS)
  • Services deployed via Helm charts:
    • webclient (Spring Cloud Gateway): Main application with OAuth2/JWT security
    • webclient-graphql (Node.js Apollo Server): GraphQL API layer on port 4000
    • webclient-nginx (Nginx): Frontend proxy serving React application

Network Configuration:

  • Service Discovery: Kubernetes ClusterIP services
  • Load Balancing: Built-in Kubernetes service load balancing
  • Network Policies: Configured to allow traffic from public nginx namespace

Request Flow in Kubernetes

External Traffic → Spring Cloud Gateway (fwoweb) → GraphQL Service (Node.js) → Backend APIs

Inter-service Communication:

  • Services communicate via Kubernetes internal DNS (e.g., fwo-webclient-graphql.vsmds-flashware-npr.svc:4000)
  • All services run with security contexts (non-root users, privilege escalation disabled)
  • Resource limits and requests configured for CPU/memory

Current Configuration

Dockerimage

ENTRYPOINT ["java", "-Dreactor.netty.pool.maxIdleTime=30000", "-Dreactor.netty.pool.maxLifeTime=60000",\
    "-javaagent:/datadog/agent/javaagent.jar", "-XX:+HeapDumpOnOutOfMemoryError", "-XX:HeapDumpPath=/tmp/heap/", \
    "-cp", ".", "org.springframework.boot.loader.launch.JarLauncher"]

Spring Cloud Gateway HTTP Client Pool Settings:

spring:
  cloud:
    gateway:
      httpclient:
        pool:
          max-idle-time: 30s
          max-life-time: 60s

Key Dependencies:

  • Spring Boot 3.4.4
  • Spring Cloud 2024.0.1
  • Spring Cloud Gateway
  • Spring WebFlux
  • Spring Security OAuth2
  • Redis for session management
  • Netty (via WebFlux)

Error Details

The exception occurs during GraphQL requests routed through the gateway in the Kubernetes environment:


io.netty.channel.unix.Errors$NativeIoException: recvAddress(..) failed: Connection reset by peer
Suppressed: The stacktrace has been enhanced by Reactor, refer to additional information below:
Error has been observed at the following site(s):
*__checkpoint ⇢ ...
*__checkpoint ⇢ ...
*__checkpoint ⇢ ...[DefaultWebFilterChain]
*__checkpoint ⇢ HTTP POST "/graphql" [ExceptionHandlingWebHandler]
Original Stack Trace:

Questions

  1. Configuration Conflict: I have Reactor Netty pool settings configured in both places:

    • Dockerfile ENTRYPOINT: -Dreactor.netty.pool.maxIdleTime=30000 -Dreactor.netty.pool.maxLifeTime=60000
    • Spring YAML: max-idle-time: 30s and max-life-time: 60s

    Could this double configuration be causing connection management issues? Which takes precedence?

  2. Kubernetes Service Discovery: Could the connection resets be related to Kubernetes service discovery or pod-to-pod communication issues? The GraphQL service runs on a separate pod with its own ClusterIP service.

  3. General: What else can cause the issue?

  4. DNS and Network Policies: Could the single-request-reopen DNS configuration or network policies be interfering with persistent connections between the gateway and GraphQL services?

2
  • Please edit the question to only ask one question at a time. Avoid very vague questions like "what else can cause the issue" or speculation on random causes. It may help to focus on a specific code sample, a specific request, and a specific error, including all of the necessary code as part of a minimal reproducible example; the extended description of the very-high-level environment doesn't give more clarity to the immediate error. Commented Jun 30 at 12:44
  • (Just guessing on a "connection reset" error: the target Service isn't correctly bound to its Pods; you have a TLS-or-not mismatch; there's some sort of code-level error in the backend process causing connections to just get dropped instead of returning a GraphQL error or an HTTP 500. Just from what you've described it's not obviously related to Kubernetes networking setup – "one service HTTP calls another" is extremely routine – and a DNS problem wouldn't surface as a TCP connection being established and then dropped.) Commented Jun 30 at 12:46

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.