4

I've used apache flink in batch processing for a while but now we want to convert this batch job to a streaming job. The problem I run into is how to run end-to-end tests.

How it worked in a batch job

When using batch processing we created end-to-end tests using cucumber.

  • We would fill up the hbase table we read from
  • Run the batch job
  • Wait for it to finish
  • verify the result

The problem in a streaming job

We would like to do something similar with the streaming job except the streaming job does not really finish.

So:

  • fill up the message queue we read from
  • Run the streaming job.
  • Wait for it to finish (how?)
  • Verify the result

We could just wait 5 seconds after every test and assume everything has been processed but that would slow everything down a lot.

Question:

What are some ways or best practices to run end-to-end tests on a streaming flink job without forceable terminating the flink job after x seconds

1

1 Answer 1

5

Most Flink DataStream sources, if they are reading from a finite input, will inject a watermark with value LONG.MAX_VALUE when they reach the end, after which the job will be terminated.

The Flink training exercises illustrate one approach to doing end-to-end testing of Flink jobs. I suggest cloning the github repo and looking at how the tests are setup. They use a custom source and sink and redirect the input and output for testing.

This topic is also discussed a bit in the documentation.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.