-1

We run a Meteor application that requires sticky sessions. We recently refactored our infrastructure and have started seeing connectivity issues.

Old Setup (Stable): A single EC2 instance running Docker Compose, Traefik, and CloudFront. DNS was an A Record pointing directly to the EC2 instance. Container IPs were relatively static during updates.

New Setup (Unstable): We migrated to Docker Swarm behind an AWS Network Load Balancer (NLB). DNS is now a CNAME pointing to the NLB. The NLB forwards TCP traffic to the Swarm, where Traefik handles ingress. The Problem Since the migration, users frequently report 502 Bad Gateway errors or broken connections immediately after a deployment.

We suspect the issue lies in the interaction between CloudFront's caching, Traefik's sticky cookies, and the volatility of Swarm tasks. This seems similar to the issue discussed in this Traefik community thread.

Our theory is:

Sticky Cookies: Traefik sets a traefiksession cookie to bind a user to a specific container (Swarm Task).

Cache Poisoning: CloudFront is currently configured to forward cookies. It caches static assets (like app.js) associated with a specific traefiksession. The Conflict: When we deploy, Swarm kills the old containers (Container A) and starts new ones (Container B).

The Error: Users receive the cached asset which binds them to "Container A". When they make a subsequent request, Traefik tries to route them to "Container A", which no longer exists. The NLB (acting as a dumb pipe) cannot help, and Traefik returns a 502.

The Question Is the standard best practice in this "Swarm + Traefik + CloudFront" stack to explicitly strip the traefiksession cookie (and all cookies) from the CloudFront Cache Key for static assets (e.g., *.js, *.css)?

Specifically:

Will removing the cookie for static files force CloudFront to fetch from any healthy container, bypassing the "dead" sticky session? Does this maintain compatibility with Meteor's client-side refreshing (Hot Code Push)? We are looking for validation that splitting behavior (Cookies for HTML, No Cookies for Assets) is the correct fix for these post-deployment 502s.

Some screenshots of our ClouldFront setting. Some screenshots of our ClouldFront setting

This is the Same Site issue seen in inspector. This is the Same Site issue seen in inspector

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.