Litmus chaos Experiment Status Fail Step: failed in chaos injection phase

754 views Asked by At

I am running a cpu-hog experiment on my pod and seeing that its failing Fail Step: failed in chaos injection phase. not seeing any logs as to why its failing. appreciate any help. The experiment, service account and results files seem to have been created fine, however, the verdict is showing that it has failed. i couldn't catch the logs when the job (runner) was underway.

ref: Cpu-hog experiment yamls I am using are here

k logs litmus-8548bd-skvbt -n litmus

{"level":"info","ts":1607551992.9267251,"logger":"controller_chaosengine","msg":"Reconciling ChaosEngine","Request.Namespace":"sbs-svs","Request.Name":"sbs-abc-server-cpu-hog-chaos"}
{"level":"info","ts":1607551993.3839076,"logger":"controller_chaosengine","msg":"Reconciling ChaosEngine","Request.Namespace":"sbs-svs","Request.Name":"sbs-abc-server-cpu-hog-chaos"}
{"level":"info","ts":1607551993.4021606,"logger":"controller_chaosengine","msg":"Reconciling ChaosEngine","Request.Namespace":"sbs-svs","Request.Name":"sbs-abc-server-cpu-hog-chaos"}

k describe chaosresult sbs-abc-server-cpu-hog-chaos-pod-cpu-hog

Name:         sbs-abc-server-cpu-hog-chaos-pod-cpu-hog
Namespace:    sbs-svs
Labels:       app.kubernetes.io/component=experiment-job
              app.kubernetes.io/part-of=litmus
              app.kubernetes.io/version=1.9.1
              chaosUID=c36498b4-16f8-4b2f-93ca-601d5c72bb56
              controller-uid=8a7be18b-8eef-4190-afda-2d24cef0fcbf
              job-name=pod-cpu-hog-7iq6o6
              name=sbs-abc-server-cpu-hog-chaos-pod-cpu-hog
Annotations:  <none>
API Version:  litmuschaos.io/v1alpha1
Kind:         ChaosResult
Metadata:
  Creation Timestamp:  2020-12-09T19:36:46Z
  Generation:          2
  Managed Fields:
    API Version:  litmuschaos.io/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:labels:
          .:
          f:app.kubernetes.io/component:
          f:app.kubernetes.io/part-of:
          f:app.kubernetes.io/version:
          f:chaosUID:
          f:controller-uid:
          f:job-name:
          f:name:
      f:spec:
        .:
        f:engine:
        f:experiment:
      f:status:
        .:
        f:experimentstatus:
          .:
          f:failStep:
          f:phase:
          f:verdict:
    Manager:         experiments
    Operation:       Update
    Time:            2020-12-09T19:37:50Z
  Resource Version:  32768765
  Self Link:         /apis/litmuschaos.io/v1alpha1/namespaces/sbs-svs/chaosresults/sbs-abc-server-cpu-hog-chaos-pod-cpu-hog
  UID:               38b0ad59-e153-4d6a-a099-ee3dad2579df
Spec:
  Engine:      sbs-abc-server-cpu-hog-chaos
  Experiment:  pod-cpu-hog
Status:
  Experimentstatus:
    Fail Step:  failed in chaos injection phase
    Phase:      Completed
    Verdict:    Fail
Events:         <none>
1

There are 1 answers

0
sbolla On

Kill container command wasn't working correctly for the distrubution I have. Following command worked for me. Updated the env variable in the engine yaml

- name: CHAOS_KILL_COMMAND
  value: "kill $(find /proc -name exe -lname '*/md5sum' 2>&1 | grep -v 'Permission denied' | awk -F/ '{print $(NF-1)}' |  head -n 1)"