how to copy binary files to the worker nodes on Databricks?

24 views Asked by At

I have following code

dbutils.fs.cp('dbfs:/mnt/loc/PyM.cpython-310-x86_64-linux-gnu.so', 'dbfs:/tmp/simple')
import PyM

and a function

def test(df):
  data={'c':[],'name':[]}
  data['c'].append(df['_c0'].iat[0])
  mc = PyM.PyM(Ln=30, dy=2)
  data['name'].append(f"Module version: {mc.get_build_info()}")
  return pd.DataFrame.from_dict(data,orient='index').transpose()

I have a pyspark dataframelines

  df=lines.limit(2).toPandas()
  df1 = test(df)

works correctly

However

  dResultAll = lines.groupby('_c0').applyInPandas(test, schema=tSchema)

produces "ModuleNotFoundError: No module named 'PyM'"

I believe it is due to the absence binary file PyM.cpython-310-x86_64-linux-gnu.so on the worker node

How do I get this to work?

I have tried https://docs.databricks.com/en/_extras/notebooks/source/kb/python/run-c-plus-plus-python.html

  num_worker_nodes = 1

  def copyFile(filepath):
    shutil.copyfile("/dbfs%s" % filepath, filepath)
    os.system("chmod u+x %s" % filepath)

  sc.parallelize(range(0, 2 * (1 + num_worker_nodes))).map(lambda s: copyFile("/tmp/simple")).count()

in the hopes of getting the binary file to the worker nodes, but that did not fix the issue

0

There are 0 answers