How to select kafka topic dynamically in apache flink kafka sink?

Question

I'm using KafkaSink as the sink in my flink application and I require to send stringifiedJSONs to different Kafka topics based on some key-value pairs (for example, a few JSONs go to topic1 and a few other sinks to another topic, topic2 and so on). But I didn't find any way in documentation to configure the Kafka topic to be chosen based on incoming data stream. Can someone please help me with this?

NOTE: I'm using flink version 14.3

    DataStream<String> data = .....
    KafkaSink<String> sink = KafkaSink.<String>builder()
            .setBootstrapServers(parameter.get("bootstrap.servers"))
            .setRecordSerializer(KafkaRecordSerializationSchema.builder()
                    .setTopic(parameter.get("kafka.output.topic"))
                    .setValueSerializationSchema(new SimpleStringSchema())
                    .build()
            )
            .setDeliverGuarantee(DeliveryGuarantee.AT_LEAST_ONCE)
            .build();
    data.sinkTo(sink);

David Anderson · Accepted Answer · 2022-06-30 09:51:29Z

3

I haven't tried this, but I believe that rather than using setTopic to hardwire the sink to a specific topic, you can instead implement the serialize method on a custom KafkaRecordSerializationSchema so that each ProducerRecord it returns specifies the topic it should be written to.

Another option would be to create a separate sink object for every topic, and then use a ProcessFunction that fans out to set of side outputs, each connected to the appropriate sink.

edited Jun 30, 2022 at 9:51

answered Jun 29, 2022 at 17:24

David Anderson

44.3k4 gold badges41 silver badges73 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Vinay Cheguri Over a year ago

thanks a lot. Writing custom KafkaRecordSerializationSchema resolved the issue.

Vinay Cheguri · Accepted Answer · 2022-07-01 11:49:29Z

0

I can sink output to multiple Kafka topics by implementing KafkaRecordSerializationSchema with a custom serialize method as suggested by @DavidAnderson. The code snippet is attached below.

public class CustomSchema implements KafkaRecordSerializationSchema<Tuple2<String,String>> {

private final String encoding = StandardCharsets.UTF_8.name();

@Override
public ProducerRecord<byte[], byte[]> serialize(Tuple2<String, String> input, KafkaSinkContext kafkaSinkContext, Long aLong) {
    String topic = input.f0;
    String data = input.f1;
    try {
        byte[] value = data==null ? null:data.getBytes(this.encoding);
        return new ProducerRecord<>(topic,value);
    } catch (UnsupportedEncodingException e) {
        throw new SerializationException("Error when serializing string to byte[] due to unsupported encoding " + this.encoding);
    }
}

And I configured the Kafka sink to use this by setRecordSerializer method.

answered Jul 1, 2022 at 11:49

Vinay Cheguri

851 silver badge10 bronze badges

2 Comments

Sucheth Shivakumar Over a year ago

How about just using .setTopicSelector(Tuple2<String, String> input -> input.f0)

Vinay Cheguri Over a year ago

Maybe we can do that too.

Collectives™ on Stack Overflow

How to select kafka topic dynamically in apache flink kafka sink?

2 Answers 2

1 Comment

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related