I have problems with the following piece of code:
def skewTemperature(cloudantdata,spark):
return spark.sql("""SELECT (1/count(temperature)) * (sum(POW(temperature-%s,3))/pow(%s,3)) as skew from washing""" %(meanTemperature(cloudantdata,spark),sdTemperature(cloudantdata,spark))).first().skew
meanTemperature
and sdTemperature
are working fine but with the above query I am getting the following error:
Py4JJavaError: An error occurred while calling o2849.collectToPython.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 315.0 failed 10 times, most recent failure: Lost task 3.9 in stage 315.0 (TID 1532, yp-spark-dal09-env5-0045): java.lang.RuntimeException: Database washing request error: {"error":"too_many_requests","reason":"You've exceeded your current limit of 5 requests per second for query class. Please try later.","class":"query","rate":5
Does anybody know how to fix this?
The error is indicating that you are exceeding the Cloudant API invocation threshold for the query class, which appears to be 5/sec for the service plan you are using. One potential solution is to limit the number of partitions by defining the
jsonstore.rdd.partitions
configuration property, as shown in the following Spark 2 example:Start with 5 and work your way down to 1 should the error persist. This setting basically limits how many concurrent requests will be sent to Cloudant. Should a setting of 1 not resolve the issue you might have to consider upgrading to a service plan with a larger threshold.