31

I have microservices(in different programming languages) running on an EC2 instance. On production I notice a few 502 Bad Gateway Errors when these services try to interact with each other. Also in the logs of the requested service it doesn't show any api call is being hit

example service A calls service B, but in service B logs there is nothing to indicate that a call came from service A.

Can it be AWS load balancer issue? Any help would be appreciated. Thanks in advance.

Solution tried: We tried making http/https connection agents in each service but still we get this issue.

Update: In lb logs, the api is logged, but the target response code shows "-" whereas lb response code shows 502 or 504. Does it mean that lb is not able to handle the traffic or my application?

Also what can be the possible solution?

6
  • 3
    You can enable lb logs , if traffic passes through it in correct ways you will be able to see output or post logs here Commented Nov 2, 2017 at 9:35
  • 1
    In lb logs, the api is logged, but the target response code shows "-" whereas lb response code shows 502 or 504. Does it mean that lb is not able to handle the traffic or my application? @KushVyas Commented Jan 9, 2018 at 12:25
  • @Root We have exactly the same problem. Do you still have it, or did you find a solution? Commented Apr 24, 2018 at 14:37
  • @JanDoerrenhaus Yes we have found the solution Commented May 7, 2018 at 17:24
  • We are experiencing the exact same issue Commented Sep 28, 2018 at 11:12

2 Answers 2

31

We had the same problem.

In our setup, an AWS Application ELB has a target group of 4 EC2 instances. On each of the EC2 instances, there is an Apache2 which forwards to a Tomcat.

The ELB has a default connection KeepAlive of 60 seconds. Apache2 has a default connection KeepAlive of 5 seconds. If the 5 seconds are over, the Apache2 closes its connection and resets the connection with the ELB. However, if a request comes in at precisely the right time, the ELB will accept it, decide which host to forward it to, and in that moment, the Apache closes the connection. This will result in said 502 error code.

The solution is: When you have cascading proxies/LBs, either align their KeepAlive timeouts, or - preferrably - even make them a little longer the further down the line you get.

We set the ELB timeout to 60 seconds and the Apache2 timeout to 120 seconds. Problem gone.

Sign up to request clarification or add additional context in comments.

7 Comments

We figured the issue in our system It was due to the immediate shutdown of ec2 instances, instead of waiting for draining period We already had elb set to 60 seconds and apache at 120seconds
We are having same issue currently, when this case happen, can we see any log on Apache side?
@Naga We didn't, no. Because the Apache does not notice anything being wrong. The ELB access logs show the request with the 502 status code, and the Apache access logs show nothing.
@Jan thank you for the information! actually it’s also the same. I checked apache access log and error log, but I could not find anything... we will try the same setting as you and see how.
This was so difficult to figure out - thanks for this Q/A. This resolved my problem as soon as I increased the KeepAliveTimeout
|
1

Health checks use HTTP2. I got my EC2 instances running NGINX to healthy by adding http2 to the listen 80.

listen 80 default_server http2;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.