Flink HA unable to recover jobs from recovery folder in S3 and jobmanager goes in CrashLoopBackOff indefinitely

35 views Asked by At

The flink cluster is running on version 1.14.3. The JobManager goes in CrashLoopBackOff and unable to recover the flink jobs from the HA recovery folder.

ERROR: The recovery file completedCheckpoint9b1238ec066c doesnt exist in s3

Caused by: org.apache.flink.util.FlinkException: Could not retrieve checkpoint 82949 from state handle under checkpointID-0000000000000082949. This indicates that the retrieved state handle is broken. Try cleaning the state handle store.

StackTrace

2024-01-19 02:31:22,606 INFO  org.apache.flink.kubernetes.kubeclient.resources.KubernetesLeaderElector [] - Create KubernetesLeaderElector dp-streaming-consumer-pp-ha-5d93bb26f04990a772ab9f53b57d99bf-jobmanager-leader with lock identity 5de221ed-fac0-4648-a801-4c87990e9f99.
2024-01-19 02:31:22,606 INFO  org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - Starting DefaultLeaderElectionService with KubernetesLeaderElectionDriver{configMapName='dp-streaming-consumer-pp-ha-5d93bb26f04990a772ab9f53b57d99bf-jobmanager-leader'}.
2024-01-19 02:31:22,606 INFO  org.apache.flink.kubernetes.kubeclient.resources.KubernetesConfigMapSharedInformer [] - Starting to watch for dp-streaming-consumer-pp/dp-streaming-consumer-pp-ha-5d93bb26f04990a772ab9f53b57d99bf-jobmanager-leader, watching id:1ec7dd0d-56df-4716-840e-f992a3bc7fd0
2024-01-19 02:31:22,608 ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Fatal error occurred in the cluster entrypoint.
org.apache.flink.util.FlinkException: JobMaster for job 38450294dc4b5acecc04295d5dd9d48b failed.
    at org.apache.flink.runtime.dispatcher.Dispatcher.jobMasterFailed(Dispatcher.java:913) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.runtime.dispatcher.Dispatcher.jobManagerRunnerFailed(Dispatcher.java:473) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.runtime.dispatcher.Dispatcher.handleJobManagerRunnerResult(Dispatcher.java:450) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$runJob$3(Dispatcher.java:427) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836) ~[?:1.8.0_322]
    at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811) ~[?:1.8.0_322]
    at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456) ~[?:1.8.0_322]
    at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$handleRunAsync$4(AkkaRpcActor.java:455) ~[flink-rpc-akka_9ed50e60-1845-4bdf-9fbd-d2dae3d7aba5.jar:1.14.3]
    at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68) ~[flink-rpc-akka_9ed50e60-1845-4bdf-9fbd-d2dae3d7aba5.jar:1.14.3]
    at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:455) ~[flink-rpc-akka_9ed50e60-1845-4bdf-9fbd-d2dae3d7aba5.jar:1.14.3]
    at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:213) ~[flink-rpc-akka_9ed50e60-1845-4bdf-9fbd-d2dae3d7aba5.jar:1.14.3]
    at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:78) ~[flink-rpc-akka_9ed50e60-1845-4bdf-9fbd-d2dae3d7aba5.jar:1.14.3]
    at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:163) ~[flink-rpc-akka_9ed50e60-1845-4bdf-9fbd-d2dae3d7aba5.jar:1.14.3]
    at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:24) [flink-rpc-akka_9ed50e60-1845-4bdf-9fbd-d2dae3d7aba5.jar:1.14.3]
    at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:20) [flink-rpc-akka_9ed50e60-1845-4bdf-9fbd-d2dae3d7aba5.jar:1.14.3]
    at scala.PartialFunction.applyOrElse(PartialFunction.scala:123) [flink-rpc-akka_9ed50e60-1845-4bdf-9fbd-d2dae3d7aba5.jar:1.14.3]
    at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122) [flink-rpc-akka_9ed50e60-1845-4bdf-9fbd-d2dae3d7aba5.jar:1.14.3]
    at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:20) [flink-rpc-akka_9ed50e60-1845-4bdf-9fbd-d2dae3d7aba5.jar:1.14.3]
    at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) [flink-rpc-akka_9ed50e60-1845-4bdf-9fbd-d2dae3d7aba5.jar:1.14.3]
    at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) [flink-rpc-akka_9ed50e60-1845-4bdf-9fbd-d2dae3d7aba5.jar:1.14.3]
    at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) [flink-rpc-akka_9ed50e60-1845-4bdf-9fbd-d2dae3d7aba5.jar:1.14.3]
    at akka.actor.Actor.aroundReceive(Actor.scala:537) [flink-rpc-akka_9ed50e60-1845-4bdf-9fbd-d2dae3d7aba5.jar:1.14.3]
    at akka.actor.Actor.aroundReceive$(Actor.scala:535) [flink-rpc-akka_9ed50e60-1845-4bdf-9fbd-d2dae3d7aba5.jar:1.14.3]
    at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:220) [flink-rpc-akka_9ed50e60-1845-4bdf-9fbd-d2dae3d7aba5.jar:1.14.3]
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:580) [flink-rpc-akka_9ed50e60-1845-4bdf-9fbd-d2dae3d7aba5.jar:1.14.3]
    at akka.actor.ActorCell.invoke(ActorCell.scala:548) [flink-rpc-akka_9ed50e60-1845-4bdf-9fbd-d2dae3d7aba5.jar:1.14.3]
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270) [flink-rpc-akka_9ed50e60-1845-4bdf-9fbd-d2dae3d7aba5.jar:1.14.3]
    at akka.dispatch.Mailbox.run(Mailbox.scala:231) [flink-rpc-akka_9ed50e60-1845-4bdf-9fbd-d2dae3d7aba5.jar:1.14.3]
    at akka.dispatch.Mailbox.exec(Mailbox.scala:243) [flink-rpc-akka_9ed50e60-1845-4bdf-9fbd-d2dae3d7aba5.jar:1.14.3]
    at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) [?:1.8.0_322]
    at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) [?:1.8.0_322]
    at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) [?:1.8.0_322]
    at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175) [?:1.8.0_322]
Caused by: org.apache.flink.runtime.client.JobInitializationException: Could not start the JobMaster.
    at org.apache.flink.runtime.jobmaster.DefaultJobMasterServiceProcess.lambda$new$0(DefaultJobMasterServiceProcess.java:97) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) ~[?:1.8.0_322]
    at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) ~[?:1.8.0_322]
    at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_322]
    at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1609) ~[?:1.8.0_322]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_322]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_322]
    at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_322]
Caused by: java.util.concurrent.CompletionException: java.lang.RuntimeException: org.apache.flink.runtime.client.JobExecutionException: Failed to initialize high-availability completed checkpoint store
    at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273) ~[?:1.8.0_322]
    at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280) ~[?:1.8.0_322]
    at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1606) ~[?:1.8.0_322]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_322]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_322]
    at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_322]
Caused by: java.lang.RuntimeException: org.apache.flink.runtime.client.JobExecutionException: Failed to initialize high-availability completed checkpoint store
    at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:316) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.util.function.FunctionUtils.lambda$uncheckedSupplier$4(FunctionUtils.java:114) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) ~[?:1.8.0_322]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_322]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_322]
    at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_322]
Caused by: org.apache.flink.runtime.client.JobExecutionException: Failed to initialize high-availability completed checkpoint store
    at org.apache.flink.runtime.scheduler.SchedulerUtils.createCompletedCheckpointStoreIfCheckpointingIsEnabled(SchedulerUtils.java:57) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.runtime.scheduler.SchedulerBase.<init>(SchedulerBase.java:180) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.runtime.scheduler.DefaultScheduler.<init>(DefaultScheduler.java:140) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.runtime.scheduler.DefaultSchedulerFactory.createInstance(DefaultSchedulerFactory.java:134) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.runtime.jobmaster.DefaultSlotPoolServiceSchedulerFactory.createScheduler(DefaultSlotPoolServiceSchedulerFactory.java:110) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:346) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:323) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.internalCreateJobMasterService(DefaultJobMasterServiceFactory.java:106) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.lambda$createJobMasterService$0(DefaultJobMasterServiceFactory.java:94) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.util.function.FunctionUtils.lambda$uncheckedSupplier$4(FunctionUtils.java:112) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) ~[?:1.8.0_322]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_322]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_322]
    at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_322]
Caused by: org.apache.flink.util.FlinkException: Could not retrieve checkpoint 82949 from state handle under checkpointID-0000000000000082949. This indicates that the retrieved state handle is broken. Try cleaning the state handle store.
    at org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStoreUtils.retrieveCompletedCheckpoint(DefaultCompletedCheckpointStoreUtils.java:111) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStoreUtils.retrieveCompletedCheckpoints(DefaultCompletedCheckpointStoreUtils.java:89) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.kubernetes.utils.KubernetesUtils.createCompletedCheckpointStore(KubernetesUtils.java:314) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.kubernetes.highavailability.KubernetesCheckpointRecoveryFactory.createRecoveredCompletedCheckpointStore(KubernetesCheckpointRecoveryFactory.java:78) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.runtime.scheduler.SchedulerUtils.createCompletedCheckpointStore(SchedulerUtils.java:91) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.runtime.scheduler.SchedulerUtils.createCompletedCheckpointStoreIfCheckpointingIsEnabled(SchedulerUtils.java:54) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.runtime.scheduler.SchedulerBase.<init>(SchedulerBase.java:180) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.runtime.scheduler.DefaultScheduler.<init>(DefaultScheduler.java:140) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.runtime.scheduler.DefaultSchedulerFactory.createInstance(DefaultSchedulerFactory.java:134) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.runtime.jobmaster.DefaultSlotPoolServiceSchedulerFactory.createScheduler(DefaultSlotPoolServiceSchedulerFactory.java:110) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:346) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:323) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.internalCreateJobMasterService(DefaultJobMasterServiceFactory.java:106) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.lambda$createJobMasterService$0(DefaultJobMasterServiceFactory.java:94) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.util.function.FunctionUtils.lambda$uncheckedSupplier$4(FunctionUtils.java:112) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) ~[?:1.8.0_322]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_322]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_322]
    at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_322]
Caused by: com.facebook.presto.hive.s3.PrestoS3FileSystem$UnrecoverableS3OperationException: com.amazonaws.services.s3.model.AmazonS3Exception: The specified key does not exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; Request ID: SF767KGGHX85APKB; S3 Extended Request ID: 3DmTRovMc6tHzPt6tsA5Qe+cOuf64hZiFBFSudvZHamY3SuoWJNA9v0YwKvOhIv8LZVcrpjvik0=; Proxy: null), S3 Extended Request ID: 3DmTRovMc6tHzPt6tsA5Qe+cOuf64hZiFBFSudvZHamY3SuoWJNA9v0YwKvOhIv8LZVcrpjvik0= (Path: s3://dp-streaming-pipelines-prod/eventwatch-preprod/recovery/default/completedCheckpoint9b1238ec066c)
    at com.facebook.presto.hive.s3.PrestoS3FileSystem$PrestoS3InputStream.lambda$openStream$2(PrestoS3FileSystem.java:1108) ~[?:?]
    at com.facebook.presto.hive.RetryDriver.run(RetryDriver.java:139) ~[?:?]
    at com.facebook.presto.hive.s3.PrestoS3FileSystem$PrestoS3InputStream.openStream(PrestoS3FileSystem.java:1093) ~[?:?]
    at com.facebook.presto.hive.s3.PrestoS3FileSystem$PrestoS3InputStream.openStream(PrestoS3FileSystem.java:1078) ~[?:?]
    at com.facebook.presto.hive.s3.PrestoS3FileSystem$PrestoS3InputStream.seekStream(PrestoS3FileSystem.java:1071) ~[?:?]
    at com.facebook.presto.hive.s3.PrestoS3FileSystem$PrestoS3InputStream.lambda$read$1(PrestoS3FileSystem.java:1015) ~[?:?]
    at com.facebook.presto.hive.RetryDriver.run(RetryDriver.java:139) ~[?:?]
    at com.facebook.presto.hive.s3.PrestoS3FileSystem$PrestoS3InputStream.read(PrestoS3FileSystem.java:1014) ~[?:?]
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) ~[?:1.8.0_322]
    at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) ~[?:1.8.0_322]
    at java.io.BufferedInputStream.read(BufferedInputStream.java:345) ~[?:1.8.0_322]
    at java.io.DataInputStream.read(DataInputStream.java:149) ~[?:1.8.0_322]
    at org.apache.flink.fs.s3presto.common.HadoopDataInputStream.read(HadoopDataInputStream.java:96) ~[?:?]
    at java.io.ObjectInputStream$PeekInputStream.read(ObjectInputStream.java:2814) ~[?:1.8.0_322]
    at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2830) ~[?:1.8.0_322]
    at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3307) ~[?:1.8.0_322]
    at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:934) ~[?:1.8.0_322]
    at java.io.ObjectInputStream.<init>(ObjectInputStream.java:396) ~[?:1.8.0_322]
    at org.apache.flink.util.InstantiationUtil$ClassLoaderObjectInputStream.<init>(InstantiationUtil.java:68) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:612) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:595) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.runtime.state.RetrievableStreamStateHandle.retrieveState(RetrievableStreamStateHandle.java:59) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStoreUtils.retrieveCompletedCheckpoint(DefaultCompletedCheckpointStoreUtils.java:102) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStoreUtils.retrieveCompletedCheckpoints(DefaultCompletedCheckpointStoreUtils.java:89) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.kubernetes.utils.KubernetesUtils.createCompletedCheckpointStore(KubernetesUtils.java:314) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.kubernetes.highavailability.KubernetesCheckpointRecoveryFactory.createRecoveredCompletedCheckpointStore(KubernetesCheckpointRecoveryFactory.java:78) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.runtime.scheduler.SchedulerUtils.createCompletedCheckpointStore(SchedulerUtils.java:91) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.runtime.scheduler.SchedulerUtils.createCompletedCheckpointStoreIfCheckpointingIsEnabled(SchedulerUtils.java:54) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.runtime.scheduler.SchedulerBase.<init>(SchedulerBase.java:180) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.runtime.scheduler.DefaultScheduler.<init>(DefaultScheduler.java:140) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.runtime.scheduler.DefaultSchedulerFactory.createInstance(DefaultSchedulerFactory.java:134) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.runtime.jobmaster.DefaultSlotPoolServiceSchedulerFactory.createScheduler(DefaultSlotPoolServiceSchedulerFactory.java:110) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:346) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:323) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.internalCreateJobMasterService(DefaultJobMasterServiceFactory.java:106) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.lambda$createJobMasterService$0(DefaultJobMasterServiceFactory.java:94) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.util.function.FunctionUtils.lambda$uncheckedSupplier$4(FunctionUtils.java:112) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) ~[?:1.8.0_322]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_322]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_322]
    at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_322]
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: The specified key does not exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; Request ID: SF767KGGHX85APKB; S3 Extended Request ID: 3DmTRovMc6tHzPt6tsA5Qe+cOuf64hZiFBFSudvZHamY3SuoWJNA9v0YwKvOhIv8LZVcrpjvik0=; Proxy: null)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1819) ~[?:?]
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1403) ~[?:?]
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1372) ~[?:?]
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145) ~[?:?]
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802) ~[?:?]
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770) ~[?:?]
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744) ~[?:?]
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704) ~[?:?]
    at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686) ~[?:?]
    at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550) ~[?:?]
    at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:530) ~[?:?]
    at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5259) ~[?:?]
    at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5206) ~[?:?]
    at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1512) ~[?:?]
    at com.facebook.presto.hive.s3.PrestoS3FileSystem$PrestoS3InputStream.lambda$openStream$2(PrestoS3FileSystem.java:1096) ~[?:?]
    at com.facebook.presto.hive.RetryDriver.run(RetryDriver.java:139) ~[?:?]
    at com.facebook.presto.hive.s3.PrestoS3FileSystem$PrestoS3InputStream.openStream(PrestoS3FileSystem.java:1093) ~[?:?]
    at com.facebook.presto.hive.s3.PrestoS3FileSystem$PrestoS3InputStream.openStream(PrestoS3FileSystem.java:1078) ~[?:?]
    at com.facebook.presto.hive.s3.PrestoS3FileSystem$PrestoS3InputStream.seekStream(PrestoS3FileSystem.java:1071) ~[?:?]
    at com.facebook.presto.hive.s3.PrestoS3FileSystem$PrestoS3InputStream.lambda$read$1(PrestoS3FileSystem.java:1015) ~[?:?]
    at com.facebook.presto.hive.RetryDriver.run(RetryDriver.java:139) ~[?:?]
    at com.facebook.presto.hive.s3.PrestoS3FileSystem$PrestoS3InputStream.read(PrestoS3FileSystem.java:1014) ~[?:?]
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) ~[?:1.8.0_322]
    at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) ~[?:1.8.0_322]
    at java.io.BufferedInputStream.read(BufferedInputStream.java:345) ~[?:1.8.0_322]
    at java.io.DataInputStream.read(DataInputStream.java:149) ~[?:1.8.0_322]
    at org.apache.flink.fs.s3presto.common.HadoopDataInputStream.read(HadoopDataInputStream.java:96) ~[?:?]
    at java.io.ObjectInputStream$PeekInputStream.read(ObjectInputStream.java:2814) ~[?:1.8.0_322]
    at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2830) ~[?:1.8.0_322]
    at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3307) ~[?:1.8.0_322]
    at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:934) ~[?:1.8.0_322]
    at java.io.ObjectInputStream.<init>(ObjectInputStream.java:396) ~[?:1.8.0_322]
    at org.apache.flink.util.InstantiationUtil$ClassLoaderObjectInputStream.<init>(InstantiationUtil.java:68) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:612) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:595) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.runtime.state.RetrievableStreamStateHandle.retrieveState(RetrievableStreamStateHandle.java:59) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStoreUtils.retrieveCompletedCheckpoint(DefaultCompletedCheckpointStoreUtils.java:102) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStoreUtils.retrieveCompletedCheckpoints(DefaultCompletedCheckpointStoreUtils.java:89) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.kubernetes.utils.KubernetesUtils.createCompletedCheckpointStore(KubernetesUtils.java:314) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.kubernetes.highavailability.KubernetesCheckpointRecoveryFactory.createRecoveredCompletedCheckpointStore(KubernetesCheckpointRecoveryFactory.java:78) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.runtime.scheduler.SchedulerUtils.createCompletedCheckpointStore(SchedulerUtils.java:91) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.runtime.scheduler.SchedulerUtils.createCompletedCheckpointStoreIfCheckpointingIsEnabled(SchedulerUtils.java:54) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.runtime.scheduler.SchedulerBase.<init>(SchedulerBase.java:180) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.runtime.scheduler.DefaultScheduler.<init>(DefaultScheduler.java:140) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.runtime.scheduler.DefaultSchedulerFactory.createInstance(DefaultSchedulerFactory.java:134) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.runtime.jobmaster.DefaultSlotPoolServiceSchedulerFactory.createScheduler(DefaultSlotPoolServiceSchedulerFactory.java:110) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:346) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:323) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.internalCreateJobMasterService(DefaultJobMasterServiceFactory.java:106) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.lambda$createJobMasterService$0(DefaultJobMasterServiceFactory.java:94) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at org.apache.flink.util.function.FunctionUtils.lambda$uncheckedSupplier$4(FunctionUtils.java:112) ~[flink-dist_2.12-1.14.3.jar:1.14.3]
    at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) ~[?:1.8.0_322]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_322]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_322]
    at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_322]
2024-01-19 02:31:22,623 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Shutting StandaloneSessionClusterEntrypoint down with application status UNKNOWN. Diagnostics Cluster entrypoint has been closed externally..
0

There are 0 answers