sample the same model with many sets of data

434 views Asked by At

I am trying to use the same pymc3 model to fit many (~100) set of data. Since I found no way to reuse the model, I create it new. This is the basic code:

for i in range(100):
  Y = data[i]
  mod = pm.Model()
  with mod:
    p = pm.Uniform('p',lower=0.1,upper=3 ,testval=2)
    Y_obs = pm.Binomial('Y_obs',n=100,p=z,observed=Y)
    err = 1
    try:
      trace = pm.sample(5000)
      err = 0
      result.append(trace)
  del mod
  if err == 0:
    del trace

With this method the process becomes slower over time and my ram usage seams to increase until it seems full which most likely is the reason for the slow down.

Is there a better way to fit the same model to different sets of data?

1

There are 1 answers

0
colcarroll On

It looks like most of the trouble with your code is python related -- you can reuse a model by returning it from a function:

def my_model(y_obs):
    with pm.Model() as model:
        p = pm.Uniform('p', lower=0.1, upper=3, testval=2)
        pm.Binomial('Y_obs', n=100, p=p, observed=Y)
    return model

Then you can iterate through your data

result = []
for y_obs in data:
    with my_model(y_obs):
        result.append(pm.sample(5000))

Diagnosing the memory issues would require knowing more about what the data looks like. You might try sampling everything at once -- something like

with pm.Model() as model:
    p = pm.Uniform('p', lower=0.1, upper=3, shape=len(data))
    pm.Binomial('Y_obs', n=100, p=p, observed=data)
    trace = pm.sample(5000)

This should speed things up, but will not help much with memory.