I am running the official Airflow helm chart from the apache-airflow/airflow repo on a kind cluster. Part of the data that this Airflow instance was setup to ETL is stored on the host file system, so I have been attempting to mount a directory from the host to the worker pods in the k8 cluster. I've tried several ways of doing this, including configmaps and persistent volume claims, all of which result in the directory being created in the pods, but not linked to the host machine - files created in pod aren't shared with host and vice versa.
My question first and foremost is: Am I going about this the right way? I know that mounting causes security problems, but it is unavoidable that the files are on the host machine. Is there a better way to approach this problem that I'm missing? Switching to LocalExecutor is an option, if that helps.
If this is the best way, then what exactly am I doing wrong here? Please let me know what information I can provide to help troubleshoot this issue.
A simple example of the volume setup in values.yaml that replicates the problem, assuming a working cluster:
Grab values.yaml:
helm show values apache-airflow/airflow > values.yaml
Edit line ~278 onward to include:
# Volumes for all airflow containers
volumes:
- name: mydir
hostPath:
# Ensure the file directory is created.
path: /home/username/airflow-kube/data-files
type: DirectoryOrCreate
# VolumeMounts for all airflow containers
volumeMounts:
- mountPath: /mnt/data-files
name: mydir
Upgrade build to include changes:
helm upgrade --install airflow apache-airflow/airflow --namespace airflow --debug --timeout 15m0s values.yaml
This particular example is cribbed from https://kubernetes.io/docs/concepts/storage/volumes/#hostpath