I am new to Kafka and I understand that there is only guarantee of message order within one partition and not across partitions.
What I am not sure is if this can create scalability issues e.g. in case a partition becomes a hotspot and need to re-partition. What would be a solution to that? Repartition if possible in a way that we still work with one partition? Or is this a known limitation in Kafka?
Add a comment
|
1 Answer
there is only guarantee of message order within one partition and not across partitions
One thing which is missing in this statement is - key. Key of the message/event is important. Only messages with same key will end up in same partition.
So called hotspot issue can only appear due to "bad" topic configuration, or due producer key naming/misuse.
- You can use "clean up policy" in order to get rid of older messages with same key. For example, if you have 3 events one after another with same key, Kafka will do a cleanup for you in partition and will delete older events, leaving only one with latest offset.
- Obviously "retention time" can be set and messages will be deleted after some period of time.
- More partitions can be added, but rebalancing is needed.
- if 90% of your events have same key, maybe you shouldn't have it at all, consider key to be - null. In this case all messages will be distributed equally across partitions. But this decision comes with a cost - message ordering is lost. You can only make this decision if you know how later processing/consuming of topic will look like.
- use Avro/Protobuf schemas in order to reduce messages sizes greatly if values are Json's and fields are repetitive.
More info about topic configs: https://docs.confluent.io/platform/current/installation/configuration/topic-configs.html