Argo Cron Workflow Not Running with emptyDir Specified

947 views Asked by At

So I have been tasked with making an ETL Pipeline. My Code works with Docker Compose, and I have been able to make tables and inject the table with all of the data so far. Now I have to make a cron workflow that will schedule this task. I have two volumes right now that get mounted. One for a secret configuration file that holds the secrets for the code to run and another is a payload file that is used for mapping attributes. When I specify the two volumes without empty_dir{} my containers immediately error out and, but the describe output on the cwf shows that it did indeed run and started again. However, there is a read-error on the csv file that gets staged for data insertion.

Here is my cronworkflow example:

apiVersion: argoproj.io/v1alpha1
kind: CronWorkflow
metadata:
  generateName: dataeng-github-metrics-
  namespace: dataops
spec:
  schedule: "*/1 * * * *" # run every 1 minute
  concurrencyPolicy: "Replace"
  startingDeadlineSeconds: 0
  workflowSpec:
    #volumes:
    #- name: my-secret-vol
    #secret:
    #secretName: my-secret
    # - CONNECTION_STRING=${SNOWFLAKE_USER}:${SNOWFLAKE_PASSWORD}@${SNOWFLAKE_ACCOUNT}/MIRANTIS/DATAENG
    # - DATABASE_BACKEND=snowflake
    # - DATAENG_CONFIG_PATH=/.dataeng/config.yaml
    # - PAYLOAD_TEAMS_CONFIG_PATH=/payloads/teams.yaml
    # - TEAMS_SPEC=all_users
    # - SCHEMA=MIRANTIS
    # - DATABASE=DATAENG
    # - TABLE=GITHUB_CONTRIBUTIONS_STAGE
    entrypoint: run-sync
    templates:
    - name: run-sync
      container:
        imagePullPolicy: Always
        image: msr.ci.mirantis.com/dataeng/dataeng_github_metrics:latest
        imagePullSecrets:
        - name: msrregcred
          namespace: dataops
        args: ['--log-level', 'debug']
        env:
        - name: CONNECTION_STRING
          valueFrom:
            secretKeyRef:
              name: connection-string
              key: CONNECTION_STRING
        - name: DATAENG_CONFIG_PATH
          value: /.dataeng/config.yaml
        - name: DATABASE_BACKEND
          value: snowflake
        - name: PAYLOAD_TEAMS_CONFIG_PATH
          value: /payloads/teams.yaml
        - name: TEAMS_SPEC
          value: all_users
        - name: SCHEMA
          value: MIRANTIS
        - name: DATABASE
          value: DATAENG
        - name: TABLE
          value: GITHUB_CONTRIBUTIONS_STAGE
        volumeMounts:
          - mountPath: /.dataeng
            name: config
          - mountPath: /payloads
            name: teamspayload
      volumes:
        - name: config
          emptyDir: {}
          secret:
            secretName: config
            optional: false
        - name: teamspayload
          emptyDir: {}
          configMap: 
            name: teamspayload

when specified no Pods get spun up and I do not see any events from the describe output of the workflow in question it's just empty. When I don't specify I get two containers main and wait that get spun up.

apiVersion: argoproj.io/v1alpha1
kind: CronWorkflow
metadata:
  generateName: dataeng-github-metrics-
  namespace: dataops
spec:
  schedule: "*/1 * * * *" # run every 1 minute
  concurrencyPolicy: "Replace"
  startingDeadlineSeconds: 0
  workflowSpec:
    #volumes:
    #- name: my-secret-vol
    #secret:
    #secretName: my-secret
    # - CONNECTION_STRING=${SNOWFLAKE_USER}:${SNOWFLAKE_PASSWORD}@${SNOWFLAKE_ACCOUNT}/MIRANTIS/DATAENG
    # - DATABASE_BACKEND=snowflake
    # - DATAENG_CONFIG_PATH=/.dataeng/config.yaml
    # - PAYLOAD_TEAMS_CONFIG_PATH=/payloads/teams.yaml
    # - TEAMS_SPEC=all_users
    # - SCHEMA=MIRANTIS
    # - DATABASE=DATAENG
    # - TABLE=GITHUB_CONTRIBUTIONS_STAGE
    entrypoint: run-sync
    templates:
    - name: run-sync
      container:
        imagePullPolicy: Always
        image: msr.ci.mirantis.com/dataeng/dataeng_github_metrics:latest
        imagePullSecrets:
        - name: msrregcred
          namespace: dataops
        args: ['--log-level', 'debug']
        env:
        - name: CONNECTION_STRING
          valueFrom:
            secretKeyRef:
              name: connection-string
              key: CONNECTION_STRING
        - name: DATAENG_CONFIG_PATH
          value: /.dataeng/config.yaml
        - name: DATABASE_BACKEND
          value: snowflake
        - name: PAYLOAD_TEAMS_CONFIG_PATH
          value: /payloads/teams.yaml
        - name: TEAMS_SPEC
          value: all_users
        - name: SCHEMA
          value: MIRANTIS
        - name: DATABASE
          value: DATAENG
        - name: TABLE
          value: GITHUB_CONTRIBUTIONS_STAGE
        volumeMounts:
          - mountPath: /.dataeng
            name: config
          - mountPath: /payloads
            name: teamspayload
      volumes:
        - name: config
          secret:
            secretName: config
            optional: false
        - name: teamspayload
          configMap: 
            name: teamspayload

This will produce two pods: main and wait

$ kubectl get pods -n dataops
NAME                                      READY   STATUS   RESTARTS   AGE
dataeng-github-metrics-2q9b2-1649718900   0/2     Error    0          2m46s

On the wait container I see this from the logs:

$ kubectl logs -n dataops dataeng-github-metrics-l2nvh-1649728560 -c wait
time="2022-04-12T01:56:20.292Z" level=info msg="listed containers" containers="map[main:{d19b209fb5de511d38bda02aa9bcf8f58fe34b60e25128225cd976944717dbb9 Exited {0 63785325361 <nil>}} wait:{c6a5abdcffb229235f68d1db3acc249be81781e8bb326d1ef3b867835e164704 Up {0 63785325360 <nil>}}]"
time="2022-04-12T01:56:20.323Z" level=info msg="listed containers" containers="map[main:{d19b209fb5de511d38bda02aa9bcf8f58fe34b60e25128225cd976944717dbb9 Exited {0 63785325361 <nil>}} wait:{c6a5abdcffb229235f68d1db3acc249be81781e8bb326d1ef3b867835e164704 Up {0 63785325360 <nil>}}]"
time="2022-04-12T01:56:20.323Z" level=info msg="Killing sidecars []"
time="2022-04-12T01:56:20.323Z" level=info msg="Alloc=5137 TotalAlloc=10211 Sys=73809 NumGC=3 Goroutines=7"

On the main container I see this from the logs:

$ kubectl logs -n dataops dataeng-github-metrics-l2nvh-1649728560 -c main
2022/04/12 01:56:20 failed to insert data: open /payloads/dataeng_github_metrics.csv: read-only file system

Why aren't the pods coming up if I specify empty_dir{} for the mounted payloads and secrets for the container runtimes?

0

There are 0 answers