currently, am executing my spark-submit commands in airflow by SSH using BashOperator & BashCommand but our client is not allowing us to do SSH into the cluster, is that possible to execute the Spark-submit command without SSH into cluster from airflow?
Trigger spark submit jobs from airflow on Dataproc Cluster without SSH
1.1k views Asked by Kriz At
1
There are 1 answers
Related Questions in GOOGLE-CLOUD-PLATFORM
- Why do I need to wait to reaccess to Firestore database even though it has already done before?
- Unable to call datastore using GCP service account key json
- Troubleshooting Airflow Task Failures: Slack Notification Timeout
- GoogleCloud Error: Not Found The requested URL was not found on this server
- Kubernetes cluster on GCE connection refused error
- Best way to upload images to Google Cloud Storage?
- Permission 'storage.buckets.get' denied on resource (or it may not exist)
- Google Datastream errors on larger MySQL tables
- Can anyone explain the output of apache-beam streaming pipeline with Fixed Window of 60 seconds?
- Parametrizing backend in terraform on gcp
- Nonsense error using a Python Google Cloud Function
- Unable to deploy to GAE from Github Actions
- Assigned A record for Subdomain in Cloud DNS to Compute Engine VM instance but not propagated/resolved yet
- Task failure in DataprocCreateClusterOperator when i add metadata
- How can I get the long running operation with google.api_core.operations_v1.AbstractOperationsClient
Related Questions in AIRFLOW
- Troubleshooting Airflow Task Failures: Slack Notification Timeout
- I want to monitor a job triggered through emrserverlessstartjoboperator. If the job is either is success or failed, want to rerun the job in airflow
- How to Resolve Workers Not Scaling with 100s of Queued Tasks in Google Cloud Composer?
- Task failure in DataprocCreateClusterOperator when i add metadata
- Load data from csv in airflow docker container to snowflake DB
- Task grouping in Airflow
- Extending Airflow DAG class - is this a bad practice?
- Elasticsearch - cascading http inputs from Airflow API
- Apache Airflow sparksubmit
- airflow dags not running as expected
- Create a daily DAG that will run for multiple days
- Transform Load pipeline for a logs system: Apache Airflow or Kafka Connect?
- My initial tasks are queued for 30-40 sec (very long in my case)
- Airflow config for running concurrent DAG tasks
- Airflow, FastAPI and postgres: host with docker
Related Questions in SPARK-SUBMIT
- Apache Airflow sparksubmit
- PySpark Script fails intermittently with "Failed to download resource"
- Spark Driver running on Kubernetes hung with too old resource version message
- Insert Overwrite partition data using Spark SQL on MINIO table
- spark-submit command: --driver-class-path is working and --jars is not working
- Can I use spark-submit to run a non-pyspark python code on spark, that is, without or with minimal changes?
- Run non-spark python code on spark to use its distributive compute to optimize the performance
- Why do spark workers write all tmp files, including shuffle and cache files, to different directories even though we define spark.local.dir?
- Spark-submit java.nio.file.NoSuchFileException on kubernetes
- Airflow - NameError: name 'ti' is not defined
- spark submit job fails in airflow but works in container
- Inconsistent results in Spark-shell and Spark-submit
- How to use external Spark with the Cloudera cluster?
- Error while deploying spark-submit container
- Spark submit repo via a remote repository
Related Questions in DATAPROC
- Imports failing with workaround in Google Dataproc Cluster Notebooks
- How to run a Spark job on Dataproc with custom conda env file
- How to connect Hive served in dataproc cluster using pyhive
- CDF custom Plugin on DataProc - Provider for class javax.xml.parsers.DocumentBuilderFactory cannot be created
- Python Dependecies for DataprocCreateBatchOperator
- GCP DataProc Serverless - VPC/subnet/firewall requirements
- Unable to find Dataproc Yarn aggregated and spark driver logs in GCP Cloud Logging
- FileNotFoundException for temporary file when runs Spark on Dataproc/Yarn
- Submitting requests to a job running in a Dataproc cluster in GCP
- Dataproc spark job (long running) on cloudrun on Gcp
- How to change log level in dataproc serverless spark
- ValueError: unknown enum label "Hudi"
- configuring dataproc with an external hive metastore
- Accessing Dataproc Cluster through Apache Livy?
- Create an email alert for a PySpark job executing on Google Dataproc
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
You can use DataprocSubmitJobOperator to submit jobs in Airflow. Just make sure to pass correct parameters to the operator. Take note that the
jobparameter is a dictionary based from Dataproc Job. So you can use this operator to submit different jobs like pyspark, pig, hive, etc.The code below submits a pyspark job:
Airflow run:
Airflow logs:
Dataproc job: