PicklingError: Could not serialize object: when calling exchange rate API into my pyspark dataframe

173 views Asked by At

I am beginner in databricks and pyspark. Currently, I have a pyspark dataframe which contains 3 columns :

  • Date
  • amount
  • Currency

I would like to have the amount columns converted in EUR and calculated with the exchange rate of the day. For that purpose, I am using the exchange rate API to find the exchange rate by taking the date and currency as parameters.

First, I defined a function which make the API call to find the exchange rate

here is my code :

def API(val1,currency,date):
  r = requests.get('https://api.exchangeratesapi.io/'+date,params={'symbols':currency})
  df = spark.read.json(sc.parallelize([r.json()]))
  df_value = df.select(F.col("rates."+currency))
  value = df_value.collect()[0][2]
  val = val1*(1/value)

  return float(val) 

Then, I defined a UDF to call this function inside my dataframe:

API_Convert = F.udf(lambda x,y,z : API(x,y,z) if (y!= 'EUR') else x, FloatType())

When I try to execute this part I get the pickling error which I absolutely don't understand...

df = df.withColumn('amount',API_Convert('amount','currency','date'))
PicklingError: Could not serialize object: Exception: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063.

Could you please help me to fix this issue ?

Many Thanks

0

There are 0 answers