3

When a file is added to my S3 bucket an S3PUT Event is triggered which puts a message into SQS. I've configured a Lambda to be triggered as soon as a message is available.

In the lambda function, I'm sending an API request to run a task on an ECS Fargate container with environment variables containing the message received from SQS. In the container I'm using the message to download the file from S3, do processing and on successful processing I wish to delete the message from SQS.

However the message gets deleted from SQS automatically after my lambda executes. Is there any way that I can configure the lambda not to automatically delete the SQS message (other than raising an exception and failing the lambda purposely), so that I can programmatically delete the message from my container?

Update: Consider this scenario which I wish to achieve.

  1. Message enters SQS queue
  2. Lambda takes the message & runs ECS API and finishes without deleting the msg from queue.
  3. Msg is in-flight.
  4. ECS container runs the task and deletes msg from queue on successful processing. If container fails, after the visibility timeout the message will re-enter the queue and the lambda will be triggered again and the cycle will repeat from step 1.
  5. If container fails more than a certain number of times, only then will message go from in-flight to DLQ.

This all currently works only if I purposely raise an exception on the lambda and I'm looking for a similar solution without doing this.

4
  • Add a copy of the message to an in-progress SQS queue (or DynamoDB table) and delete that later when the associated task is complete. Commented Sep 11, 2020 at 15:43
  • @jarmod yes I thought of that but in-case my container-task fails I also want it to retry the task with that message. If it fails the message will just be lying in SQS. All of this works if I raise an exception and fail the lambda purposely but I don't feel that is the best practice Commented Sep 11, 2020 at 15:55
  • You could add a scheduled Lambda that queries the in-progress queue (or DB) a few times per day as needed, determines if a given workflow has exceeded its maximum TTL, and then re-add that message to the original SQS queue. Commented Sep 11, 2020 at 16:05
  • If the Lambda function signals back failure, it might put the message back on the queue. (Or, more accurately, the message reappears after the invisibility period expires.) You could configure the Dead Letter Queue to activate after a given number of attempts. Commented Sep 11, 2020 at 23:19

2 Answers 2

3

The behaviour is intended and as long as SQS is configured as a Lambda trigger, once the function returns (i.e. completes execution) the message is automatically deleted.

The way I see it, to achieve the behaviour you're describing you have 4 options:

  • Remove SQS as Lambda trigger and instead execute the Lambda Function on a schedule and poll the queue yourself. The lambda will read messages that are available but unless you delete them explicitly they will become available again once their visibility timeout is expired. You can achieve this with a CloudWatch schedule.
  • Remove SQS as Lambda trigger and instead execute the Lambda Function explicitly. Similar to the above but instead of executing on a schedule all the time, the Lambda function could be triggered by the producer of the message itself.
  • Keep the SQS Lambda trigger and store the message in an alternative SQS Queue (as suggested by @jarmod in a comment above).
  • Configure the producer of the message to publish a message to an SNS Topic and subscribe 2 SQS Queue to this topic. One of the two queues will trigger a Lambda Function, the other one will be used by your ECS tasks.

Update

Based on the new info provided, you have another option:

Leave the event flow as it is and let the message in the SQS be deleted by Lambda. Then, in your ECS Task, handle the failure state and put a new message in the SQS with the same payload/body. This will allow you to retry indefinitely.

There's no reason why the SQS message has to be the exact same, what you're interested is the body/payload.

You might want to consider adding a mechanism to set a limit to these retries and post a message to a DLQ.

Sign up to request clarification or add additional context in comments.

7 Comments

-With respect to your 2nd point, if I trigger the Lambda function by the producer of the message itself(S3 PUT Event) that puts the message in SQS, then won't there be a possibility that the lambda function executes before the message has been put in SQS ? -With respect to your 3rd & 4th points, if my container-task fails I want it to retry the task with that message. If it fails the message will just be lying in SQS.
In #2 I was assuming that your producer was a process itself, in the case of an S3 Event then you can't do both things (put a msg in SQS and trigger the Lambda). Let's say you trigger only your Lambda function, what do you need the SQS for? The lambda will be called already with the same content of the S3 upload event. Basically the same that ends up in SQS. In that case your architecture would look like this S3 -> S3 Put event -> Lambda Fn -> SQS msg -> ECS.
Regardless of all my answer and even your original question, there isn't any scenario in with ECS handles retries for you like a Lambda Function does. If your ECS task fail you'll have to make it retry on your own.
SQS msg -> ECS How would this work? I need a Lambda function that invokes the ECS API to run the task.
Consider this scenario which I wish to achieve. 1)Message enters SQS queue 2)Lambda takes the message & runs ECS API and finishes without deleting the msg from queue. 3)Msg is in-flight. 4)ECS container runs the task and deletes msg from queue on successfull processing. If container fails, after the visibility timeout the message will re-enter the queue and the lambda will be triggered again and the cycle will repeat from step 1. 5)If container fails more than a certain number of times, only then will messge go from in-flight to DLQ.
|
0

One solution I can think of is: remove lambda triggered by the sqs queue, create an alarm that on sqs queue. When the alarm triggers, scale out the ecs task. When there's no item in the queue, scale down the ecs task. Let the ecs task just poll the queue and handle all the messages.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.