0

I'm encountering a Flink job failure and would appreciate any input on what might be misconfigured:

2025‑07‑28 17:30:52
org.apache.flink.runtime.JobException: Recovery is suppressed by NoRestartBackoffTimeStrategy
...
Caused by: java.lang.Exception: Failed to abort transactions with label postgres_test‑test‑0‑1
...
Caused by: com.starrocks.data.load.stream.exception.StreamLoadFailException: Could not get load state because of incorrect response status code 404, label: postgres_test‑test‑0‑1, response body: <HTML><HEAD>
<TITLE>404 Not Found</TITLE>
</HEAD><BODY>
<H1>Not Found</H1>
</BODY></HTML>

Context:

Checkout project repository: https://github.com/iman-sandbox/flink-postgres-to-starrocks

Flink CDC pipeline reading from Postgres and writing to StarRocks with exactly-once semantics.

The sink label prefix is postgres_test.

Errors occur during transaction abort during job restart.

Details:

Flink version: 1.17.2

Flink connector version: 1.2.10

StarRocks version: 3.5.2

Sink config includes sink.label-prefix='postgres_test', sink.wait-for-continue.timeout-

ms='60000', and semantic exactly-once.

Questions:

What does the Recovery is suppressed by NoRestartBackoffTimeStrategy indicate in my context is it due to missing restart strategy or disabled checkpointing?

Why might the sink aborter fail to retrieve the load state (404)? Could this be caused by misconfigured sink properties or missing endpoints?

Any recommendations or config tweaks (Flink or StarRocks) to ensure cleanup of lingering transactions and successful restart? Is there any docker image to use simply for my use case [Real-time data syncing from Postgres to Starrocks by FlinkSQL]

Here is Apache Flink Jobmanager dashboard screenshot: Apache Flink Jobmanager dashboard

Here is Apache Flink Jobmanager Logs:

Caused by: com.starrocks.data.load.stream.exception.StreamLoadFailException: Could not get load state because of incorrect response status code 404, label: postgres_test-test-0-1, response body: <HTML><HEAD>
<TITLE>404 Not Found</TITLE>
</HEAD><BODY>
<H1>Not Found</H1>
</BODY></HTML>

    at com.starrocks.data.load.stream.DefaultStreamLoader.getLabelState(DefaultStreamLoader.java:471)
    at com.starrocks.data.load.stream.DefaultStreamLoader.getLoadStatus(DefaultStreamLoader.java:444)
    at com.starrocks.connector.flink.table.sink.LingeringTransactionAborter.tryAbortTransaction(LingeringTransactionAborter.java:172)
    ... 17 more
2025-07-28 14:00:52,104 INFO  org.apache.flink.runtime.dispatcher.StandaloneDispatcher     [] - Job 07781516ad3e3fbe257d044cab1baf2d has been registered for cleanup in the JobResultStore after reaching a terminal state.
2025-07-28 14:00:52,107 INFO  org.apache.flink.runtime.jobmaster.JobMaster                 [] - Stopping the JobMaster for job 'insert-into_default_catalog.default_database.starrocks_test' (07781516ad3e3fbe257d044cab1baf2d).
2025-07-28 14:00:52,109 INFO  org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore [] - Shutting down
2025-07-28 14:00:52,109 INFO  org.apache.flink.runtime.jobmaster.JobMaster                 [] - Disconnect TaskExecutor 172.18.0.5:43843-1eeb2e because: Stopping JobMaster for job 'insert-into_default_catalog.default_database.starrocks_test' (07781516ad3e3fbe257d044cab1baf2d).
2025-07-28 14:00:52,110 INFO  org.apache.flink.runtime.jobmaster.slotpool.DefaultDeclarativeSlotPool [] - Releasing slot [06fab36829286332376141a41e5e864c].
2025-07-28 14:00:52,111 INFO  org.apache.flink.runtime.jobmaster.JobMaster                 [] - Close ResourceManager connection 5b0f69b6fc769dadd140c550530829aa: Stopping JobMaster for job 'insert-into_default_catalog.default_database.starrocks_test' (07781516ad3e3fbe257d044cab1baf2d).
2025-07-28 14:00:52,112 INFO  org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Disconnect job manager [email protected]://flink@jobmanager:6123/user/rpc/jobmanager_2 for job 07781516ad3e3fbe257d044cab1baf2d from the resource manager.

Any help or pointers would be great.

2
  • I'm not sure what's wrong, but you haven't given the task managers much memory. The default is taskmanager.memory.process.size: 1728m. Commented Jul 28 at 15:51
  • The Postgres not just has five records at all, but I try it with more memory size ... Thank you Commented Jul 28 at 19:17

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.