I’m trying to route WebSocket connections deterministically to the same backend pod(in a k8s deployment) based on a query parameter (room/league id). This works with a single ingress-nginx pod, but becomes inconsistent once I scale the ingress controller (multiple nginx pods).
Topology
Kubernetes (EKS) Ingress: ingress-nginx (multiple controller/nginx pods) One Deployment exposes two ports behind one Service with two ingresses: REST (list rooms) WebSocket (/websocket) that creates/joins a room stored in-memory on the selected pod Goal
All clients connecting with the same league_id should land on the same backend pod (room state is in-memory). Current behavior
With 1 ingress-nginx pod: zero latency; creator and joiners always land on the same backend pod. With >1 ingress-nginx pods: the second user often lands on a different backend pod and connection is failed; we added client retries to “eventually” hit the right one, causing latency. Sometimes even the first user experiences latency. Minimal Ingress (WebSocket) config
Using consistent hashing by query param via upstream-hash-by. Separate Ingress for WebSockets to avoid rewrite issues.
annotations:
nginx.ingress.kubernetes.io/affinity: cookie
nginx.ingress.kubernetes.io/affinity-mode: persistent
nginx.ingress.kubernetes.io/large-client-header-buffers: 4 32k
nginx.ingress.kubernetes.io/proxy-buffer-size: 32k
nginx.ingress.kubernetes.io/proxy-buffering: "off"
nginx.ingress.kubernetes.io/proxy-connect-timeout: "3950"
nginx.ingress.kubernetes.io/proxy-http-version: "1.1"
nginx.ingress.kubernetes.io/proxy-read-timeout: "3950"
nginx.ingress.kubernetes.io/proxy-request-buffering: "off"
nginx.ingress.kubernetes.io/proxy-send-timeout: "3950"
nginx.ingress.kubernetes.io/session-cookie-max-age: "3600"
nginx.ingress.kubernetes.io/session-cookie-name: myservice-session
** nginx.ingress.kubernetes.io/upstream-hash-by: $arg_league_id
**....
Repro
- User A connects: wss://api.example.com/websocket?league_id=123 → room created on Pod X.
- User B connects: wss://api.example.com/websocket?league_id=123 → intermittently routed to Pod Y when multiple ingress-nginx pods exist; after retries, eventually reaches Pod X.
What I’ve tried:
- nginx.ingress.kubernetes.io/load-balance: hash
- nginx.ingress.kubernetes.io/upstream-hash-by: $arg_league_id
- Separate Ingress object for WebSockets (no rewrite)
- Verified the Service exposes the correct ws port
- Observed that single ingress replica fixes the issue; multiple replicas reintroduce inconsistency.
- Change nginx versions
- Change deployment into statefulset (didn't helped, it was just an nonsense attempt, i ran out of ideas)
- removed the affinity annotation didn't work as well
- remove cookie
Questions:
- Is upstream-hash-by expected to be consistent across multiple ingress-nginx replicas, or is it only deterministic within a single nginx instance?
- How can I guarantee identical upstream selection across all ingress-nginx pods?
- Is there a way to enforce stable upstream peer ordering so the hash mapping matches on every ingress pod?
- Do I need to avoid service-level load-balancing (e.g., ensure service-upstream=false) or set any specific annotation to keep hashing at the pod endpoint level?
- Should I enable the “consistent” hash ring behavior (if supported) or use upstream-hash-by-subset annotations?
- If this can’t be made truly consistent at the ingress layer, is the recommended approach to externalize room state (e.g., Redis and reverse proxy for the websocket) to make another smart routing in the back?
Environment: EKS: 1.31 . ingress-nginx controller: 1.13.2 (also 1.9.5 wasn't successful). NLb . backend: nodejs .