Airflow KubernetesExecutor, logs do not show up in UI until after executor pods complete

2.9k views Asked by At

I have started using KubernetesExecutor and I have set up a PV/PVC with an AWS EFS to store logs for my dags. I am also using s3 remote logging.

All the logging is working perfectly fine after a dag completes. However, I want to be able to see the logs of my jobs as they are running for long running ones.

When I exec into my scheduler pod, while an executor pod is running, I am able to see the .log file of the currently running job because of the shared EFS. However, when I cat the log file, I do not see the logs as long as the executor is still running. Once the executor finishes however, I can see the full logs both when I cat the file and in the airflow UI.

Weirdly, on the other hand, when I exec into the executor pod as it is running, and I cat the exact same log file in the shared EFS, I am able to see the correct logs up until that point in the job, and when I immediately cat from the scheduler or check the UI, I can also see the logs up until that point.

So it seems that when I cat from within the executor pod, it is causing the logs to be flushed in some way, so that it is available everywhere. Why are the logs not flushing regularly?

Here are the config variables I am setting, note these env variables get set in my webserver/scheduler and executor pods:

# ----------------------
# For Main Airflow Pod (Webserver & Scheduler)
# ----------------------
export PYTHONPATH=$HOME
export AIRFLOW_HOME=$HOME
export PYTHONUNBUFFERED=1

# Core configs
export AIRFLOW__CORE__LOAD_EXAMPLES=False
export AIRFLOW__CORE__SQL_ALCHEMY_CONN=${AIRFLOW__CORE__SQL_ALCHEMY_CONN:-postgresql://$DB_USER:$DB_PASSWORD@$DB_HOST:5432/$DB_NAME}
export AIRFLOW__CORE__FERNET_KEY=$FERNET_KEY
export AIRFLOW__CORE__DAGS_FOLDER=$AIRFLOW_HOME/git/dags/$PROVIDER-$ENV/

# Logging configs
export AIRFLOW__LOGGING__BASE_LOG_FOLDER=$AIRFLOW_HOME/logs/
export AIRFLOW__LOGGING__REMOTE_LOGGING=True
export AIRFLOW__LOGGING__REMOTE_LOG_CONN_ID=aws_default
export AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER=s3://path-to-bucket/airflow_logs
export AIRFLOW__LOGGING__TASK_LOG_READER=s3.task
export AIRFLOW__LOGGING__LOGGING_CONFIG_CLASS=config.logging_config.LOGGING_CONFIG

# Webserver configs
export AIRFLOW__WEBSERVER__COOKIE_SAMESITE=None

My logging config looks like the one in the question here

I thought this could be a python buffering issue so added PYTHONUNBUFFERED=1, but that didn't help. This is happening whether I use the PythonOperator or BashOperator

Is it the case that K8sExecutors logs just won't be available during their runtime? Only after? Or is there some configuration I must be missing?

2

There are 2 answers

2
SergiyKolesnikov On

Airflow is working as documented in this case:

The logs only appear in your DFS after the task has finished.

0
Jedrzej G On

I had the same issue and those are things that helped me - worth checking them on your end

  • PYTHONUNBUFFERED=1 is not enough, but necessary to view logs in realtime. Please keep it
  • have EFS mounted in web, scheduler, and pod_template (executor).
  • Your experience with log file being complete after task having finished, makes me wonder if PVC you use for logs, has ReadWriteMany accessMode
  • Are the paths you cat in different pods, identical? Do they include full task format, eg efs/logs/dag_that_executes_via_KubernetesPodOperator/task1/2021-09-21T19\:00\:21.894859+00\:00/1.log ? Asking because, before i had EFS hooked up in every place (scheduler, web, pod_template), i could only access executor logs that do not include task name and task time
  • have EFS logs folder belong to airflow (for me uid 50000 because may have to prepare this from different place), group root, mode 755
  • do not have AIRFLOW__LOGGING__LOGGING_CONFIG_CLASS set up. Try to get things running as vanilla as possible, before introducing custom logging config

If you have remote logging set up, i understand that after task completes, the first line in the UI is going to say Reading remote log from, but what does the first line say for you when the task is running? reading remote or mentioning usage of local log file ?

  • If it says about remote, this would mean that you don't have EFS hooked up in every place.
  • If it says about local, i would check your EFS settings (readwritemany) and directory ownership and mode