I used spark 2.4.0 with "org.apache.bahir - spark-sql-cloudant - 2.4.0" I have to download all json files from couchDB to hdfs.
val df = spark
.partitionBy("year", "month", "day")
Total file size is 160GB (> 13 millions files) Spark job after 5 minutes running gets error
Caused by: com.cloudant.client.org.lightcouch.CouchDbException: Error retrieving server response
Increasing the timeout does not help, falls off but later What are the ways out of the situation?
Use another endpoint for queries, use _changes against _all_docs helped me