Consuming from single kafka partition by multiple consumers

Question

I read following in kafka docs:

The way consumption is implemented in Kafka is by dividing up the partitions in the log over the consumer instances so that each instance is the exclusive consumer of a "fair share" of partitions at any point in time.

Kafka only provides a total order over records within a partition, not between different partitions in a topic.

Per-partition ordering combined with the ability to partition data by key is sufficient for most applications.

However, if you require a total order over records this can be achieved with a topic that has only one partition, though this will mean only one consumer process per consumer group.

I read following on this page:

Consumers read from any single partition, allowing you to scale throughput of message consumption in a similar fashion to message production.

Consumers can also be organized into consumer groups for a given topic — each consumer within the group reads from a unique partition and the group as a whole consumes all messages from the entire topic.

If you have more consumers than partitions then some consumers will be idle because they have no partitions to read from.

If you have more partitions than consumers then consumers will receive messages from multiple partitions.

If you have equal numbers of consumers and partitions, each consumer reads messages in order from exactly one partition.

Doubts

Does this means that single partition cannot be consumed by multiple consumers? Cant we have single partition and a consumer group with more than one consumer and make them all consume from single partition?
If single partition can be consumed by only single consumer, I was thinking why is this design decision?
What if I need total order over records and still need it to be consumed parallel? Is it undoable in Kafka? Or such scenario does not make sense?

Mickael Maison · Accepted Answer · 2019-09-16 09:55:53Z

27

Within a consumer group, at any time a partition can only be consumed by a single consumer. No you can't have 2 consumers within the same group consuming from the same partition at the same time.
Kafka Consumer groups allow to have multiple consumer "sort of" behave like a single entity. The group as a whole should only consume messages once. If multiple consumer in a group were to consume the same partitions, these records would be processed multiple times.

If you need to consume a partition multiple times, be sure these consumers are in different groups.
When processing needs to happen in order (serially) at any time there's only a single task to do. If you have records 1, 2 and 3 and want to process them in order, you cannot do anything until message 1 has been processed. It's the same for message 2 and 3. So what do you want to do in parallel?

answered Sep 16, 2019 at 9:55

Mickael Maison

27.2k8 gold badges90 silver badges81 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

MsA Over a year ago

I was thinking "total order" means "ordering between all messages", which is not possible when we have more than one partitions (say M1, M3 is in P1 and M2, M4 is in P2, we cant tell if M2 arrived before M3 or vice versa. Or can we actually tell that?) However, it seems that "total ordering" is more related to "order of processing/consumption" and thats why more than one consumer consuming from one partition does not makes sense? Is this interpretation correct?

Mickael Maison Over a year ago

Kafka only guarantees ordering within partitions, so total message ordering is only possible with a single partition. Kafka at its heart is a distributed log. Messages are only appended at the tail of the log. The ordering guarantee is about how messages will be stored ordered (as they arrive), allowing to consume them in that order. So yes your interpretation is now correct.

MsA Over a year ago

one unrelated question: I read (1) a message is ‘committed’ when all in sync replicas have applied it to their log (2) "any committed message will not be lost, as long as at least one in sync replica is alive". My doubt is doesnt being committed means written to file system? And if yes, how committed messages can get lost even if no replica is alive?

Mickael Maison Over a year ago

No, in this scenario, committed does not necessarily means "written to the file system". Kafka never explicitely flushes its writes but relies on the operating system to do it. Once replicas have the message, it is considered committed. Multiple machines would have to fail (in a short interval) at this point in order to lose the message. In practice the message will also often be written to disk shortly afterwards.

gtato Over a year ago

I am also having a bit of a trouble understanding the first point. Why is that a good decision? What if you want to scale consumption up or down based on the throughput?

|

Collectives™ on Stack Overflow

Consuming from single kafka partition by multiple consumers

1 Answer 1

9 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

9 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related