io.netty.channel.unix.Errors$NativeIoException: Connection reset by peer in Spring Cloud Gateway with WebFlux routing to GraphQL backend in Kubernetes

Ask Question

Asked 4 months ago

Modified 4 months ago

Viewed 177 times

I'm experiencing a Connection reset by peer exception in my Spring Boot 3.4.4 application that uses Spring Cloud Gateway with WebFlux to serve a web application. The entire stack is deployed in a Kubernetes cluster using Helm charts with multiple interconnected services.

Kubernetes Architecture Overview

Cluster: Azure Kubernetes Service (AKS)
Services deployed via Helm charts:
- webclient (Spring Cloud Gateway): Main application with OAuth2/JWT security
- webclient-graphql (Node.js Apollo Server): GraphQL API layer on port 4000
- webclient-nginx (Nginx): Frontend proxy serving React application

Network Configuration:

Service Discovery: Kubernetes ClusterIP services
Load Balancing: Built-in Kubernetes service load balancing
Network Policies: Configured to allow traffic from public nginx namespace

Request Flow in Kubernetes

External Traffic → Spring Cloud Gateway (fwoweb) → GraphQL Service (Node.js) → Backend APIs

Inter-service Communication:

Services communicate via Kubernetes internal DNS (e.g., fwo-webclient-graphql.vsmds-flashware-npr.svc:4000)
All services run with security contexts (non-root users, privilege escalation disabled)
Resource limits and requests configured for CPU/memory

Current Configuration

Dockerimage

ENTRYPOINT ["java", "-Dreactor.netty.pool.maxIdleTime=30000", "-Dreactor.netty.pool.maxLifeTime=60000",\
    "-javaagent:/datadog/agent/javaagent.jar", "-XX:+HeapDumpOnOutOfMemoryError", "-XX:HeapDumpPath=/tmp/heap/", \
    "-cp", ".", "org.springframework.boot.loader.launch.JarLauncher"]

Spring Cloud Gateway HTTP Client Pool Settings:

spring:
  cloud:
    gateway:
      httpclient:
        pool:
          max-idle-time: 30s
          max-life-time: 60s

Key Dependencies:

Spring Boot 3.4.4
Spring Cloud 2024.0.1
Spring Cloud Gateway
Spring WebFlux
Spring Security OAuth2
Redis for session management
Netty (via WebFlux)

Error Details

The exception occurs during GraphQL requests routed through the gateway in the Kubernetes environment:


io.netty.channel.unix.Errors$NativeIoException: recvAddress(..) failed: Connection reset by peer
Suppressed: The stacktrace has been enhanced by Reactor, refer to additional information below:
Error has been observed at the following site(s):
*__checkpoint ⇢ ...
*__checkpoint ⇢ ...
*__checkpoint ⇢ ...[DefaultWebFilterChain]
*__checkpoint ⇢ HTTP POST "/graphql" [ExceptionHandlingWebHandler]
Original Stack Trace:

Questions

Configuration Conflict: I have Reactor Netty pool settings configured in both places:
- Dockerfile ENTRYPOINT: -Dreactor.netty.pool.maxIdleTime=30000 -Dreactor.netty.pool.maxLifeTime=60000
- Spring YAML: max-idle-time: 30s and max-life-time: 60s
Could this double configuration be causing connection management issues? Which takes precedence?
Kubernetes Service Discovery: Could the connection resets be related to Kubernetes service discovery or pod-to-pod communication issues? The GraphQL service runs on a separate pod with its own ClusterIP service.
General: What else can cause the issue?
DNS and Network Policies: Could the single-request-reopen DNS configuration or network policies be interfering with persistent connections between the gateway and GraphQL services?

edited Jun 30 at 12:04

asked Jun 30 at 11:43

stephank95

171 silver badge3 bronze badges

Please edit the question to only ask one question at a time. Avoid very vague questions like "what else can cause the issue" or speculation on random causes. It may help to focus on a specific code sample, a specific request, and a specific error, including all of the necessary code as part of a minimal reproducible example; the extended description of the very-high-level environment doesn't give more clarity to the immediate error.

David Maze
– David Maze

2025-06-30 12:44:23 +00:00
Commented Jun 30 at 12:44
(Just guessing on a "connection reset" error: the target Service isn't correctly bound to its Pods; you have a TLS-or-not mismatch; there's some sort of code-level error in the backend process causing connections to just get dropped instead of returning a GraphQL error or an HTTP 500. Just from what you've described it's not obviously related to Kubernetes networking setup – "one service HTTP calls another" is extremely routine – and a DNS problem wouldn't surface as a TCP connection being established and then dropped.)

David Maze
– David Maze

2025-06-30 12:46:38 +00:00
Commented Jun 30 at 12:46

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

io.netty.channel.unix.Errors$NativeIoException: Connection reset by peer in Spring Cloud Gateway with WebFlux routing to GraphQL backend in Kubernetes

Kubernetes Architecture Overview

Request Flow in Kubernetes

Current Configuration

Error Details

Questions

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

Kubernetes Architecture Overview

Request Flow in Kubernetes

Current Configuration

Error Details

Questions

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest