Spark Scala HTTP Response from Rest Api Timeout Exception

701 views Asked by At

I have to read the JSON response from REST API using Spark Scala, I have written code (both using scala.io.Source.fromInputStream as well Scalaj HTTP) but the Job is not running on HDFS, everytime its giving me Timeout exception though I have increased the timeout (Connection/Read) to maximum.

On my Intellij (local) its working fine, I saw in HDFS logs apart from timeout exception nothing else I can find, but this can be seen were it still taking default timeout value i.e 100 ms (not taking the max value that I have provided in my code)

Below are the logs :

21/08/19 11:17:54 INFO jdk.JdkHttpClient: connect timeout: Period{time=100, timeUnit=MILLISECONDS}, read timeout: Period{time=100, timeUnit=MILLISECONDS}, shutdown timeout: Period{time=10, timeUnit=MILLISECONDS}
21/08/19 11:17:54 INFO jdk.JdkHttpClient: connect timeout: Period{time=100, timeUnit=MILLISECONDS}, read timeout: Period{time=100, timeUnit=MILLISECONDS}, shutdown timeout: Period{time=10, timeUnit=MILLISECONDS}
21/08/19 11:17:54 INFO btrace.SparkSensorUtils: Sending init confJson
1/08/19 11:20:12 ERROR mainClasses.TestSap: Connection timed out (Connection timed out)
java.net.ConnectException: Connection timed out (Connection timed out)
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188
    

Below are the code I am using :

Using HTTP Scalaj :

```scala

System.setProperty("sun.net.http.allowRestrictedHeaders", "true")

    val spark = Context.getSparkSession()

    import spark.implicits._

    spark.conf.set("spark.network.timeout", "3000s")
    spark.conf.set("spark.executor.heartbeatInterval", "1500s")
    spark.conf.set("hive.spark.client.server.connect.timeout", "100000ms")
    spark.conf.set("hive.spark.client.connect.timeout", "100000ms")

 val result = Http(Url)
             .auth("xxxx","yyyy")
             .option(HttpOptions.connTimeout(999999999))
             .option(HttpOptions.readTimeout(999999999))
             .asString 

```
Using Scala.io.source :

```scala
 @throws(classOf[java.io.IOException])
    @throws(classOf[java.net.SocketTimeoutException])
    def GetUrlContentJson(url: String): DataFrame ={

      val userpass = "xxxx" + ":" + "yyyy";
     
      val basicAuth = "Basic " +
        javax.xml.bind.DatatypeConverter.printBase64Binary(userpass.getBytes());
      val connection = new URL(url).openConnection
      connection.setRequestProperty("Authorization", basicAuth)
      connection.setConnectTimeout(999999999)
      connection.setReadTimeout(999999999)
      connection.setUseCaches(false)

      val result = scala.io.Source.fromInputStream(connection.getInputStream).mkString
      if (connection.getInputStream != null) connection.getInputStream.close
}
```
For both the cases using same URL , I can able to get response over running on Intellij, whereas running the same on HDFS (Spark Scala Jar) it is giving me Timeout Exception.
It will be really helpful, if anyone can help me to resolve this issue.
0

There are 0 answers