We are using Spring Boot Kafka listener to consume messages from Kafka. Currently, we have the auto commit offset enabled and so the offset commits happen every 5 seconds (default auto commit interval). We are polling Kafka for a batch of 100 records at a time and take about 300 milliseconds to process the messages. In a regular scenario where there will be multiple polls of messages and one commit every 5 seconds as they get processed fairly quickly. Even if the service pod goes down before the next offset commit, the code is idempotent so any messages not committed yet will be republished/reprocessed in the event of this issue.
However, I am thinking of a way to handle a remote scenario of the batch processing taking more than the default window of 5 seconds in which case the offset would be committed before the processing is completed. If the processing is completed fine, then there is no issue. but if there is any error with processing these messages which are already marked consumed, we will lose those messages.
I am looking for ways to have this handled without losing the messages. I was thinking of changing the ack mode to manual and commit after every batch no matter how long it takes (of course less than the max.poll.interval) but this will increase the # of offset commits by 10 fold. I also thought of having an internal timer to acknowledge when the timer elapses say 5 seconds since the last poll. This way the commit won't happen when the current poll processing takes longer than 5 seconds and will be done only later.
However, I am wondering if this leaves a possibility of we receiving a bunch of messages and they will be waiting to be committed until the next poll and if this is the last set of messages, then they will be waiting for a long time. I have also tried different ack modes like COUNT/COUNT_TIME but none of that are solving the use case here. Not sure how this can be handled.
Any thoughts on this?