What's the purpose of Kafka's key/value pair-based messaging?

Question

All of the examples of Kafka | producers show the ProducerRecord's key/value pair as not only being the same type (all examples show <String,String>), but the same value. For example:

producer.send(new ProducerRecord<String, String>("someTopic", Integer.toString(i), Integer.toString(i)));

But in the Kafka docs, I can't seem to find where the key/value concept (and its underlying purpose/utility) is explained. In traditional messaging (ActiveMQ, RabbitMQ, etc.) I've always fired a message at a particular topic/queue/exchange. But Kafka is the first broker that seems to require key/value pairs instead of just a regulare 'ole string message.

So I ask: What is the purpose/usefulness of requiring producers to send KV pairs?

Conceptually, an event has a key, value, timestamp, and optional metadata headers. Here's an example event: Event key: "Alice" Event value: "Made a payment of $200 to Bob" Event timestamp: "Jun. 25, 2020 at 2:06 p.m." — nihar
– nihar, Commented Jun 25, 2021 at 18:28

Matthias J. Sax · Accepted Answer · 2017-02-06 19:01:26Z

133

Kafka uses the abstraction of a distributed log that consists of partitions. Splitting a log into partitions allows to scale-out the system.

Keys are used to determine the partition within a log to which a message get's appended to. While the value is the actual payload of the message. The examples are actually not very "good" with this regard; usually you would have a complex type as value (like a tuple-type or a JSON or similar) and you would extract one field as key.

See: http://kafka.apache.org/intro#intro_topics and http://kafka.apache.org/intro#intro_producers

In general the key and/or value can be null, too. If the key is null a random partition will the selected. If the value is null it can have special "delete" semantics in case you enable log-compaction instead of log-retention policy for a topic (http://kafka.apache.org/documentation#compaction).

edited Feb 6, 2017 at 19:01

answered Nov 29, 2016 at 20:53

Matthias J. Sax

62.8k8 gold badges128 silver badges148 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

reim Over a year ago

And notably, keys also play a relevant part in the streaming API of Kafka, with KStream and KTable - see here.

gvo Over a year ago

Keys can be used to determine the partition, but it's just a default strategy of the producer. Ultimately, it is the producer who chooses which partition to use.

leoconco Over a year ago

@gvo Does the key have more uses?

gvo Over a year ago

It can be used to keep only one instance of a message per key, as mentioned in the log compaction link. I don't know about other use-cases.

Matthias J. Sax Over a year ago

If you specify the partition parameter it will be used, and the key will be "ignored" (or course, the key will still be written into the topic). -- This allows you to have a customized partitioning even if you have keys.

|

MikeK · Accepted Answer · 2019-10-18 12:01:56Z

38

Late addition... Specifying the key so that all messages on the same key go to the same partition is very important for proper ordering of message processing if you will have multiple consumers in a consumer group on a topic.

Without a key, two messages on the same key could go to different partitions and be processed by different consumers in the group out of order.

answered Oct 18, 2019 at 12:01

MikeK

3813 silver badges3 bronze badges

Comments

Gorubean · Accepted Answer · 2024-07-02 09:21:27Z

The class used for key or value in consumer depends on selected deserializer. You specify deserializers for key and value in consumer properties. Here comes the example code (from javadoc https://kafka.apache.org/20/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html):

 Properties props = new Properties();
 props.put("bootstrap.servers", "localhost:9092");
 props.put("group.id", "test");
 props.put("enable.auto.commit", "true");
 props.put("auto.commit.interval.ms", "1000");
 props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
 props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
 KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
 consumer.subscribe(Arrays.asList("foo", "bar"));
 while (true) {
     ConsumerRecords<String, String> records = consumer.poll(100);
     for (ConsumerRecord<String, String> record : records)
         System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
 }

You can read about Serializers/Deserializers here: https://www.baeldung.com/kafka-custom-serializer

The main thing you need to understand: this Serializers/Deserializers have nothing to do with JSON, AVRO or Protobuf. It is different "serialization"

Utkarsh Gupta · Accepted Answer · 2020-01-29 03:39:20Z

-5

Another interesting use case

We could use the key attribute in Kafka topics for sending user_ids and then can plug in a consumer to fetch streaming events (events stored in value attributes). This could allow you to process any max-history of user event sequences for creating features in your machine learning models.

I still have to find out if this is possible or not. Will keep updating my answer with further details.

answered Jan 29, 2020 at 3:39

Utkarsh Gupta

5

Collectives™ on Stack Overflow

What's the purpose of Kafka's key/value pair-based messaging?

4 Answers 4

10 Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

10 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related