facing connection issue in SPARK-HADOOP standalone mode while connecting on LAN between machines

22 views Asked by At

I am using SPARK-Hadoop 3.3.2 bundle On IP 192.168.1.4x: I have started spark master (port 7077) and worker

On IP 192.168.1.22y: I have my webapp.py which: a. creates spark session (see below config):

spark = SparkSession.builder \
                .appName("dataHudi") \
                .master('spark://192.168.1.40:7077') \
                .config("spark.submit.deployMode","client") \
                .config('spark.driver.bindAddress', '192.168.1.40') \
                .config('spark.driver.host', '192.168.1.40') \
                .config('spark.driver.port', '33037') \
                .config('spark.jars.packages', 'org.apache.hudi:hudi-spark3.3-bundle_2.12:0.13.1') \
                .config('spark.serializer', 'org.apache.spark.serializer.KryoSerializer') \
                .config('spark.sql.catalog.spark_catalog', 'org.apache.spark.sql.hudi.catalog.HoodieCatalog') \
                .config('spark.sql.extensions', 'org.apache.spark.sql.hudi.HoodieSparkSessionExtension') \
                .getOrCreate()

b. submits a job:

        spark_df = spark.createDataFrame(ingested_df)

        spark_df.write \
        .format("org.apache.hudi") \
        .options(**hudi_options) \
        .mode("append") \
        .save(basePath_ID +"/"+f"{unique_filename}")

when I check logs on 192.168.1.4x:8080 and 192.168.1.4x:8081 then I see that application is running and executors exiting and starting. but then when I check executor stderr and stdout logs then I see that the spark is trying to connect to a random port say 33037 and connection is failing on that port.

well, I tried to run both spark and application on the same machine and on ip 192.168.1.22y This worked.

But on LAN it fails. we tried with different configurations. changing the bind address, driver.host, driver.port etc..

0

There are 0 answers