In my project, GKE runs many jobs daily. Sometimes I see that a job runs twice: the first time partially and the second time fully, although "restartPolicy: Never" is defined. It happens very seldom (about one time per 200 - 300 runs).
This is an example:
I 2020-12-03T00:12:45Z Started container mot-test-deleteoldvalidations-container 
I 2020-12-03T00:12:45Z Created container mot-test-deleteoldvalidations-container 
I 2020-12-03T00:12:45Z Successfully pulled image "gcr.io/xxxxx/mot-del-old-validations:v16" 
I 2020-12-03T00:12:40Z Pulling image "gcr.io/xxxxx/mot-del-old-validations:v16" 
I 2020-12-03T00:12:39Z Stopping container mot-test-deleteoldvalidations-container 
I 2020-12-03T00:01:59Z Started container mot-test-deleteoldvalidations-container 
I 2020-12-03T00:01:59Z Created container mot-test-deleteoldvalidations-container 
I 2020-12-03T00:01:59Z Successfully pulled image "gcr.io/xxxx/mot-del-old-validations:v16" 
I 2020-12-03T00:01:40Z Pulling image "gcr.io/xxxxx/mot-del-old-validations:v16" 
From job's YAML:
spec:
  backoffLimit: 0
  completions: 1
  parallelism: 1
resources:
          limits:
            cpu: "1"
            memory: 2500Mi
          requests:
            cpu: 500m
            memory: 2Gi
        nsPolicy: ClusterFirst
      restartPolicy: Never
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      volumes: 
The reason for stopping container is "Killing". How can I avoid this behavior?
 
                        
As you mention in comment section, your
NetworkPolicyis set toNever. You have also setspec.backoffLimit,spec.complementionsandspec.parallelismwhich should work. However, the Documentation - Handling Pod and container failures mentioned that this behavior is possible and it's not considered as a problem.Note that even if you specify .spec.parallelism = 1 and .spec.completions = 1 and .spec.template.spec.restartPolicy = "Never", the same program may sometimes be started twice.
As addition, in CronJob documentation, the best practise is to make jobs Idempotent.
As your whole
job manifestis still a mystery, two workarounds come to my mind. Depends on the scenario it might help.First workaround
Use PodAntiAffinity which won't allow deploy the second pod with the same label/selector.
Second workaround
Use initContainer lock, so the first pod puts a lock, and the second pod, if lock is detected wait for 3-5 seconds and exit.