I'm experiencing a Connection reset by peer exception in my Spring Boot 3.4.4 application that uses Spring Cloud Gateway with WebFlux to serve a web application. The entire stack is deployed in a Kubernetes cluster using Helm charts with multiple interconnected services.
Kubernetes Architecture Overview
- Cluster: Azure Kubernetes Service (AKS)
- Services deployed via Helm charts:
- webclient (Spring Cloud Gateway): Main application with OAuth2/JWT security
- webclient-graphql (Node.js Apollo Server): GraphQL API layer on port 4000
- webclient-nginx (Nginx): Frontend proxy serving React application
Network Configuration:
- Service Discovery: Kubernetes ClusterIP services
- Load Balancing: Built-in Kubernetes service load balancing
- Network Policies: Configured to allow traffic from public nginx namespace
Request Flow in Kubernetes
External Traffic → Spring Cloud Gateway (fwoweb) → GraphQL Service (Node.js) → Backend APIs
Inter-service Communication:
- Services communicate via Kubernetes internal DNS (e.g.,
fwo-webclient-graphql.vsmds-flashware-npr.svc:4000) - All services run with security contexts (non-root users, privilege escalation disabled)
- Resource limits and requests configured for CPU/memory
Current Configuration
Dockerimage
ENTRYPOINT ["java", "-Dreactor.netty.pool.maxIdleTime=30000", "-Dreactor.netty.pool.maxLifeTime=60000",\
"-javaagent:/datadog/agent/javaagent.jar", "-XX:+HeapDumpOnOutOfMemoryError", "-XX:HeapDumpPath=/tmp/heap/", \
"-cp", ".", "org.springframework.boot.loader.launch.JarLauncher"]
Spring Cloud Gateway HTTP Client Pool Settings:
spring:
cloud:
gateway:
httpclient:
pool:
max-idle-time: 30s
max-life-time: 60s
Key Dependencies:
- Spring Boot 3.4.4
- Spring Cloud 2024.0.1
- Spring Cloud Gateway
- Spring WebFlux
- Spring Security OAuth2
- Redis for session management
- Netty (via WebFlux)
Error Details
The exception occurs during GraphQL requests routed through the gateway in the Kubernetes environment:
io.netty.channel.unix.Errors$NativeIoException: recvAddress(..) failed: Connection reset by peer
Suppressed: The stacktrace has been enhanced by Reactor, refer to additional information below:
Error has been observed at the following site(s):
*__checkpoint ⇢ ...
*__checkpoint ⇢ ...
*__checkpoint ⇢ ...[DefaultWebFilterChain]
*__checkpoint ⇢ HTTP POST "/graphql" [ExceptionHandlingWebHandler]
Original Stack Trace:
Questions
Configuration Conflict: I have Reactor Netty pool settings configured in both places:
- Dockerfile ENTRYPOINT:
-Dreactor.netty.pool.maxIdleTime=30000 -Dreactor.netty.pool.maxLifeTime=60000 - Spring YAML:
max-idle-time: 30sandmax-life-time: 60s
Could this double configuration be causing connection management issues? Which takes precedence?
- Dockerfile ENTRYPOINT:
Kubernetes Service Discovery: Could the connection resets be related to Kubernetes service discovery or pod-to-pod communication issues? The GraphQL service runs on a separate pod with its own ClusterIP service.
General: What else can cause the issue?
DNS and Network Policies: Could the
single-request-reopenDNS configuration or network policies be interfering with persistent connections between the gateway and GraphQL services?