Unable to run Argo workflow due to an opaque error

1.6k views Asked by At

I want to trigger a manual workflow in Argo. I am using Openshift and ArgoCD, have scheduled workflows that are running successfully in Argo but failing when triggering a manual run for one workflow.

The concerned workflow is:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: "obslytics-data-exporter-manual-workflow-"
  labels:
    owner: "obslytics-remote-reader"
    app: "obslytics-data-exporter"
    pipeline: "obslytics-data-exporter"
spec:
  arguments:
    parameters:
      - name: start_timestamp
        value: "2020-11-18T20:00:00Z"
  entrypoint: manual-trigger
  templates:
    - name: manual-trigger
      steps:
        - - name: trigger
            templateRef:
              name: "obslytics-data-exporter-workflow-triggers"
              template: trigger-workflow
  volumes:
    - name: "obslytics-data-exporter-workflow-secrets"
      secret:
        secretname: "obslytics-data-exporter-workflow-secrets"

When I run the command:

argo submit trigger.local.yaml

The build pod is completed but the rest pods fail:

➜  dh-workflow-obslytics git:(master) ✗ oc get pods                                                        
NAME                                                       READY     STATUS      RESTARTS   AGE
argo-ui-7fcf5ff95-9k8cc                                    1/1       Running     0          3d
gateway-controller-76bb888f7b-lq84r                        1/1       Running     0          3d
obslytics-data-exporter-1-build                            0/1       Completed   0          3d
obslytics-data-exporter-calendar-gateway-fbbb8d7-zhdnf     2/2       Running     1          3d
obslytics-data-exporter-manual-workflow-m7jdg-1074461258   0/2       Error       0          4m
obslytics-data-exporter-manual-workflow-m7jdg-1477271209   0/2       Error       0          4m
obslytics-data-exporter-manual-workflow-m7jdg-1544087495   0/2       Error       0          4m
obslytics-data-exporter-manual-workflow-m7jdg-1979266120   0/2       Completed   0          4m
obslytics-data-exporter-sensor-6594954795-xw8fk            1/1       Running     0          3d
opendatahub-operator-8994ddcf8-v8wxm                       1/1       Running     0          3d
sensor-controller-58bdc7c4f4-9h4jw                         1/1       Running     0          3d
workflow-controller-759649b79b-s69l7                       1/1       Running     0          3d

The pods starting with obslytics-data-exporter-manual-workflow are the concerned pods that are failing. When I attempt to debug by describing pods:

➜  dh-workflow-obslytics git:(master) ✗ oc describe pods/obslytics-data-exporter-manual-workflow-4hzqz-3278280317
Name:               obslytics-data-exporter-manual-workflow-4hzqz-3278280317
Namespace:          dh-dev-argo
Priority:           0
PriorityClassName:  <none>
Node:               avsrivas-dev-ocp-3.11/10.0.111.224
Start Time:         Tue, 24 Nov 2020 07:27:57 -0500
Labels:             workflows.argoproj.io/completed=true
                    workflows.argoproj.io/workflow=obslytics-data-exporter-manual-workflow-4hzqz
Annotations:        openshift.io/scc=restricted
                    workflows.argoproj.io/node-message=timeout after 0s
                    workflows.argoproj.io/node-name=obslytics-data-exporter-manual-workflow-4hzqz[0].trigger[1].run[0].metric-split(0:cluster_version)[0].process-metric(0)
                    workflows.argoproj.io/template={"name":"run-obslytics","arguments":{},"inputs":{"parameters":[{"name":"metric","value":"cluster_version"},{"name":"start_timestamp","value":"2020-11-18T20:00:00Z"},{"na...
Status:             Failed
IP:                 10.128.0.69
Controlled By:      Workflow/obslytics-data-exporter-manual-workflow-4hzqz
Init Containers:
  init:
    Container ID:  docker://25b95c684ef66b13520ba9deeba353082142f3bb39bafe443ee508074c58047e
    Image:         argoproj/argoexec:v2.4.2
    Image ID:      docker-pullable://docker.io/argoproj/argoexec@sha256:4e393daa6ed985cf680bcf0ecf04f7b0758940f0789505428331fcfe99cce06b
    Port:          <none>
    Host Port:     <none>
    Command:
      argoexec
      init
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Tue, 24 Nov 2020 07:27:59 -0500
      Finished:     Tue, 24 Nov 2020 07:27:59 -0500
    Ready:          True
    Restart Count:  0
    Environment:
      ARGO_POD_NAME:                    obslytics-data-exporter-manual-workflow-4hzqz-3278280317 (v1:metadata.name)
      ARGO_CONTAINER_RUNTIME_EXECUTOR:  k8sapi
    Mounts:
      /argo/podmetadata from podmetadata (rw)
      /argo/staging from argo-staging (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-qpggm (ro)
Containers:
  wait:
    Container ID:  docker://a94e7f1bc1cfec4c8b549120193b697c91760bb8f3af414babef1d6f7ccee831
    Image:         argoproj/argoexec:v2.4.2
    Image ID:      docker-pullable://docker.io/argoproj/argoexec@sha256:4e393daa6ed985cf680bcf0ecf04f7b0758940f0789505428331fcfe99cce06b
    Port:          <none>
    Host Port:     <none>
    Command:
      argoexec
      wait
    State:          Terminated
      Reason:       Completed
      Message:      timeout after 0s
      Exit Code:    0
      Started:      Tue, 24 Nov 2020 07:28:00 -0500
      Finished:     Tue, 24 Nov 2020 07:28:01 -0500
    Ready:          False
    Restart Count:  0
    Environment:
      ARGO_POD_NAME:                    obslytics-data-exporter-manual-workflow-4hzqz-3278280317 (v1:metadata.name)
      ARGO_CONTAINER_RUNTIME_EXECUTOR:  k8sapi
    Mounts:
      /argo/podmetadata from podmetadata (rw)
      /mainctrfs/argo/staging from argo-staging (rw)
      /mainctrfs/etc/obslytics-data-exporter from obslytics-data-exporter-workflow-secrets (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-qpggm (ro)
  main:
    Container ID:  docker://<some_id>
    Image:         docker-registry.default.svc:5000/<some_id>
    Image ID:      docker-pullable://docker-registry.default.svc:5000/<some_id>
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/sh
      -e
    Args:
      /argo/staging/script
    State:          Terminated
      Reason:       Error
      Exit Code:    126
      Started:      Tue, 24 Nov 2020 07:28:01 -0500
      Finished:     Tue, 24 Nov 2020 07:28:01 -0500
    Ready:          False
    Restart Count:  0
    Limits:
      memory:  1Gi
    Requests:
      memory:     1Gi
    Environment:  <none>
    Mounts:
      /argo/staging from argo-staging (rw)
      /etc/obslytics-data-exporter from obslytics-data-exporter-workflow-secrets (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-qpggm (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  podmetadata:
    Type:  DownwardAPI (a volume populated by information about the pod)
    Items:
      metadata.annotations -> annotations
  obslytics-data-exporter-workflow-secrets:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  obslytics-data-exporter-workflow-secrets
    Optional:    false
  argo-staging:
    Type:    EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:  
  default-token-qpggm:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-qpggm
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  node-role.kubernetes.io/compute=true
Tolerations:     node.kubernetes.io/memory-pressure:NoSchedule
Events:
  Type    Reason     Age   From                            Message
  ----    ------     ----  ----                            -------
  Normal  Scheduled  27m   default-scheduler               Successfully assigned dh-dev-argo/obslytics-data-exporter-manual-workflow-4hzqz-3278280317 to avsrivas-dev-ocp-3.11
  Normal  Pulled     27m   kubelet, avsrivas-dev-ocp-3.11  Container image "argoproj/argoexec:v2.4.2" already present on machine
  Normal  Created    27m   kubelet, avsrivas-dev-ocp-3.11  Created container
  Normal  Started    27m   kubelet, avsrivas-dev-ocp-3.11  Started container
  Normal  Pulled     27m   kubelet, avsrivas-dev-ocp-3.11  Container image "argoproj/argoexec:v2.4.2" already present on machine
  Normal  Created    27m   kubelet, avsrivas-dev-ocp-3.11  Created container
  Normal  Started    27m   kubelet, avsrivas-dev-ocp-3.11  Started container
  Normal  Pulling    27m   kubelet, avsrivas-dev-ocp-3.11  pulling image "docker-registry.default.svc:5000/dh-dev-argo/obslytics-data-exporter:latest"
  Normal  Pulled     27m   kubelet, avsrivas-dev-ocp-3.11  Successfully pulled image "docker-registry.default.svc:5000/dh-dev-argo/obslytics-data-exporter:latest"
  Normal  Created    27m   kubelet, avsrivas-dev-ocp-3.11  Created container
  Normal  Started    27m   kubelet, avsrivas-dev-ocp-3.11  Started container

The only thing I learn from the above description is that the pods fail due to an error. I am unable to see any error in order to debug this issue.

When I attempt to read the Argo watch logs:

Name:                obslytics-data-exporter-manual-workflow-8wzcc
Namespace:           dh-dev-argo
ServiceAccount:      default
Status:              Running
Created:             Tue Nov 24 08:01:10 -0500 (8 minutes ago)
Started:             Tue Nov 24 08:01:10 -0500 (8 minutes ago)
Duration:            8 minutes 10 seconds
Progress:            
Parameters:          
  start_timestamp:   2020-11-18T20:00:00Z

STEP                                              TEMPLATE                                                    PODNAME                                                   DURATION  MESSAGE
 ● obslytics-data-exporter-manual-workflow-8wzcc  manual-trigger                                                                                                                                             
 └───● trigger                                    obslytics-data-exporter-workflow-triggers/trigger-workflow                                                                                                 
     ├───✔ get-labels(0)                          obslytics-data-exporter-workflow-template/get-labels        obslytics-data-exporter-manual-workflow-8wzcc-2604296472  6s                                   
     └───● run                                    obslytics-data-exporter-workflow-template/init                                                                                                             
         └───● metric-split(0:cluster_version)    metric-worker                                                                                                                                              
             └───● process-metric                 run-obslytics                                                                                                                                              
                 ├─✖ process-metric(0)            run-obslytics                                               obslytics-data-exporter-manual-workflow-8wzcc-4222496183  6s        failed with exit code 126  
                 └─◷ process-metric(1)            run-obslytics                                               obslytics-data-exporter-manual-workflow-8wzcc-531670266   7m        PodInitializing        
0

There are 0 answers