-1

There were 2 pods running in my micro-service, both of them got restarted with kubernetes reason as OOM killed enter image description here (The above dashboard uses the following query->sum(0,increase(kube_pod_container_status_last_terminated_reason{cluster="prod_cluster",container=~"$service"}[$__range])) by (reason,container,namespace,pod) > 0) We analysed the memory and CPU trends for these pods: CPU trend Memory trend As we can see CPU and memory looks normal. These are the specs for this service: Specs file Also, the average CPU consumption and memory consumption for both pods was normal. Next we suspected connection issues(with downstream service/databases/kafka) was checked but it was fine and nothing was observed. This led us to beleive the issue might have been at the node level, we checked the memory consumption of node and realised it was always fine: Container memory Node memory trend As seen, container and node memory both are fine and no spikes/leakages are observed. Also we analysed the traffic patterns, latencies of all the downstream systems but still could not find anything.

We analysed all the possible explanations of OOM killed but could not reach to anything conclusive. Expected a memory breach.

1 Answer 1

1

There is a possibility that the limit set to the pod is too low. You can use the kubectl describe pod $POD_NAME command to check the events.

I recommend making the memory limit higher.

Sign up to request clarification or add additional context in comments.

2 Comments

The specs sheet is attached, limits are more than the memory or cpu of the pod.
Try to make the limits bigger and check if it still does not work.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.