PyMC3/Edward/Pyro on Spark?

1.4k views Asked by At

Has anyone tried using a python probabilistic programming library with Spark? Or does anyone have a good idea of what it would take?

I have a feeling Edward would be simplest because there are already tools connecting Tensorflow and Spark, but still hazy about what low-level code changes would be required.

I know distributed MCMC is still an area of active research (see MC-Stan on Spark?), so is this even reasonable to implement? Thanks!

2

There are 2 answers

0
Germán Alfaro On

You can use Tensorflow connectors with Edward since it is based on Tensorflow, one of the main drawbacks of MCMC is very computational intensive, you may try Variational inference for your Bayesian models it approximates the target distribution. (this also applies to Pyro and PyMC3 I believe), you can also work with Tensorflow distributed tensorflow distributed

I also recommend you to use/try a library called "Dask "https://dask.pydata.org/en/latest/Dask, you can scale your model from your workstation to a cluster it also has Tensorflow connectors.

Hope this helps

0
fritzo On

I've seen people run Pyro+PyTorch in PySpark, but the use case was CPU-only and did not involve distributed training.