I have a little problem when running a custom jar application on a cluster. First, I ran my custom jar application in a local flink installation:
/bin/flink run /home/osboxes/WordCount.jar --input file:/home/osboxes/texto.txt --output file:/home/osboxes/saida.txt
Then I got
Job has been submitted with JobID bbe63a7295c6376b6fc7ae3ce369c0d5
Program execution finished
Job with JobID bbe63a7295c6376b6fc7ae3ce369c0d5 has finished.
Job Runtime: 191 ms
Now I can look at the /home/osboxes/saida.txt and see the results. Everything is ok.
Its time to try the cluster. First I ran an flink example (/examples/streaming/WordCount.jar) on the cluster:
./bin/flink run-application -t yarn-application ./examples/streaming/WordCount.jar --input hdfs://10.168.202.90:8020/tmp/input.txt --output hdfs://10.168.202.90:8020/tmp/output.txt
The file input.txt was placed in hdfs filesystem in folder /tmp/
When the code have finished, I can see the results using:
hdfs dfs -cat /tmp/output.txt/2025-01-16--10/part-a32715d9-c7c8-4e05-9fd4-60a622a7407c-0
Everthing is ok. So, as far as i can tell, cluster is working just fine. Its time to try to run my custom version of WordCount, using:
./bin/flink run-application -t yarn-application /tmp/WordCount.jar --input hdfs://10.168.202.90:8020/tmp/input.txt --output hdfs://10.168.202.90:8020/tmp/output2.txt
the flink user have the permission to access /tmp/WordCount.jar. I got lot of messages from flink. I still get a message telling me the job was ACCEPTED. but then, I get the following exception:
org.apache.flink.client.deployment.ClusterDeploymentException: Couldn't deploy Yarn Application Cluster
at org.apache.flink.yarn.YarnClusterDescriptor.deployApplicationCluster(YarnClusterDescriptor.java:488)
at org.apache.flink.client.deployment.application.cli.ApplicationClusterDeployer.run(ApplicationClusterDeployer.java:67)
at org.apache.flink.client.cli.CliFrontend.runApplication(CliFrontend.java:212)
at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1098)
at org.apache.flink.client.cli.CliFrontend.lambda$mainInternal$9(CliFrontend.java:1189)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.client.cli.CliFrontend.mainInternal(CliFrontend.java:1189)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1157)
Caused by: org.apache.flink.yarn.YarnClusterDescriptor$YarnDeploymentException: The YARN application unexpectedly switched to state FAILED during deployment.
Diagnostics from YARN: Application application_1731527993122_0037 failed 1 times (global limit =2; local limit is =1) due to AM Container for appattempt_1731527993122_0037_000001 exited with exitCode: 1
Failing this attempt.Diagnostics: [2025-01-16 10:28:25.516]Exception from container-launch.
Container id: container_1731527993122_0037_01_000001
Exit code: 1
[2025-01-16 10:28:25.520]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
[2025-01-16 10:28:25.521]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
For more detailed output, check the application tracking page: http://cert-tdp-mn01.lab.local:8088/cluster/app/application_1731527993122_0037 Then click on links to logs of each attempt.
. Failing the application.
If log aggregation is enabled on your cluster, use this command to further investigate the issue:
yarn logs -applicationId application_1731527993122_0037
at org.apache.flink.yarn.YarnClusterDescriptor.startAppMaster(YarnClusterDescriptor.java:1262)
at org.apache.flink.yarn.YarnClusterDescriptor.deployInternal(YarnClusterDescriptor.java:633)
at org.apache.flink.yarn.YarnClusterDescriptor.deployApplicationCluster(YarnClusterDescriptor.java:481)
... 10 more
2025-01-16 10:28:25,716 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Cancelling deployment from Deployment Failure Hook
2025-01-16 10:28:25,720 INFO org.apache.hadoop.yarn.client.RMProxy [] - Connecting to ResourceManager at cert-tdp-mn01.lab.local/10.168.202.90:8050
2025-01-16 10:28:25,723 INFO org.apache.hadoop.yarn.client.AHSProxy [] - Connecting to Application History server at cert-tdp-mn02.lab.local/10.168.202.91:10200
2025-01-16 10:28:25,729 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Killing YARN application
2025-01-16 10:28:25,757 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl [] - Killed application application_1731527993122_0037
2025-01-16 10:28:25,758 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Deleting files in hdfs://10.168.202.90:8020/user/flink/.flink/application_1731527993122_0037.
The main problem is that when I access yarn resource manager UI, there are some logs. The following was the one I think is the problem from jobmanager.log:
org.apache.flink.util.FlinkException: Could not load the provided entrypoint class.
at org.apache.flink.client.program.DefaultPackagedProgramRetriever.getPackagedProgram(DefaultPackagedProgramRetriever.java:215) ~[flink-dist-1.18.0.jar:1.18.0]
at org.apache.flink.yarn.entrypoint.YarnApplicationClusterEntryPoint.getPackagedProgram(YarnApplicationClusterEntryPoint.java:126) ~[flink-dist-1.18.0.jar:1.18.0]
at org.apache.flink.yarn.entrypoint.YarnApplicationClusterEntryPoint.main(YarnApplicationClusterEntryPoint.java:96) [flink-dist-1.18.0.jar:1.18.0]
Caused by: org.apache.flink.client.program.ProgramInvocationException: The program's entry point class 'WordCount' could not be loaded due to a linkage failure.
at org.apache.flink.client.program.PackagedProgram.loadMainClass(PackagedProgram.java:493) ~[flink-dist-1.18.0.jar:1.18.0]
at org.apache.flink.client.program.PackagedProgram.<init>(PackagedProgram.java:153) ~[flink-dist-1.18.0.jar:1.18.0]
at org.apache.flink.client.program.PackagedProgram.<init>(PackagedProgram.java:65) ~[flink-dist-1.18.0.jar:1.18.0]
at org.apache.flink.client.program.PackagedProgram$Builder.build(PackagedProgram.java:691) ~[flink-dist-1.18.0.jar:1.18.0]
at org.apache.flink.client.program.DefaultPackagedProgramRetriever.getPackagedProgram(DefaultPackagedProgramRetriever.java:213) ~[flink-dist-1.18.0.jar:1.18.0]
... 2 more
Caused by: java.lang.UnsupportedClassVersionError: WordCount has been compiled by a more recent version of the Java Runtime (class file version 55.0), this version of the Java Runtime only recognizes class file versions up to 52.0
at java.lang.ClassLoader.defineClass1(Native Method) ~[?:1.8.0_341]
at java.lang.ClassLoader.defineClass(ClassLoader.java:756) ~[?:1.8.0_341]
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) ~[?:1.8.0_341]
at java.net.URLClassLoader.defineClass(URLClassLoader.java:473) ~[?:1.8.0_341]
at java.net.URLClassLoader.access$100(URLClassLoader.java:74) ~[?:1.8.0_341]
at java.net.URLClassLoader$1.run(URLClassLoader.java:369) ~[?:1.8.0_341]
at java.net.URLClassLoader$1.run(URLClassLoader.java:363) ~[?:1.8.0_341]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_341]
at java.net.URLClassLoader.findClass(URLClassLoader.java:362) ~[?:1.8.0_341]
at org.apache.flink.util.ChildFirstClassLoader.loadClassWithoutExceptionHandling(ChildFirstClassLoader.java:71) ~[flink-dist-1.18.0.jar:1.18.0]
at org.apache.flink.util.FlinkUserCodeClassLoader.loadClass(FlinkUserCodeClassLoader.java:51) ~[flink-dist-1.18.0.jar:1.18.0]
at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ~[?:1.8.0_341]
at org.apache.flink.util.FlinkUserCodeClassLoaders$SafetyNetWrapperClassLoader.loadClass(FlinkUserCodeClassLoaders.java:192) ~[flink-dist-1.18.0.jar:1.18.0]
at java.lang.Class.forName0(Native Method) ~[?:1.8.0_341]
at java.lang.Class.forName(Class.java:348) ~[?:1.8.0_341]
at org.apache.flink.client.program.PackagedProgram.loadMainClass(PackagedProgram.java:479) ~[flink-dist-1.18.0.jar:1.18.0]
at org.apache.flink.client.program.PackagedProgram.<init>(PackagedProgram.java:153) ~[flink-dist-1.18.0.jar:1.18.0]
at org.apache.flink.client.program.PackagedProgram.<init>(PackagedProgram.java:65) ~[flink-dist-1.18.0.jar:1.18.0]
at org.apache.flink.client.program.PackagedProgram$Builder.build(PackagedProgram.java:691) ~[flink-dist-1.18.0.jar:1.18.0]
at org.apache.flink.client.program.DefaultPackagedProgramRetriever.getPackagedProgram(DefaultPackagedProgramRetriever.java:213) ~[flink-dist-1.18.0.jar:1.18.0]
... 2 more
So it seems my eclipse is using 1_11 (55) version of java and flink expect 1_8 (52). So, flink (which is written in java) could not load it.
If this is correct ? so, I have Eclipse with Java 11, but I set java compiler for the project as version 8. so it seems its not working. Any clues ?