We are currently building a Dataflow on the GCP. The purpose of the pipeline is to read from a Kafka topic on an external cluster and write the entries into a BigQuery table. Due to extensive authentication, we need to create an own Dockerfile to function for the WorkerVM (to include the truststore file in the file path). But when we run that, we receive the error ""StatusRuntimeException: UNAVAILABLE: Finding sdk for status channel failed. SDK harness not connected with control channel."
The Beam pipeline itself is started from a Flex Template Base Image (python39-template-launcher). The Beam pipeline is running on Python 3.9 and Apache Beam 2.49.0, so the version can not be the problem.
This is my Dockerfile for the Worker VM:
FROM apache/beam_python3.9_sdk:2.49.0
RUN apt-get update
COPY ./ca.pem /secret_stuff/ca.pem
COPY ./kafka_test_truststore_pwd.txt /secret_stuff/kafka_test_truststore_pwd.txt
COPY ./help.sh /secret_stuff/help.sh
RUN echo "vielleicht wird jetzt worker sdk gestartet"
RUN chmod +x /secret_stuff/help.sh
ENTRYPOINT ["bash", "-c", "-e", "/secret_stuff/help.sh"]
The help.sh-script is just for generating the truststore and as its last step it does /opt/apache/beam/boot "$@" to start the Worker SDK. I am extending the Beam arguments in the Python code like that:
beam_args.extend(["--sdk_container_image=eu.gcr.io/{project}/{folder}/{the dockerfile}"])
beam_args.extend(["--experiments=use_runner_v2"])
beam_args.extend(["--sdk_location=container"])
What is the error? How should I make the Worker VM Dockerfile?