Hierarchical bayesian interaction model specification using pymc3

43 views Asked by At

I'm currently working on modeling a 2-level hierarchical Bayesian regression using pymc3 in Python. I've extensively searched for resources on Bayesian hierarchical regression, but most examples I found involve hierarchical models with a categorical variable at the second level, resembling the Hierarchical Linear model from the pymc3 documentation: GLM-hierarchical.

However, I'm in need of a hierarchical model where a beta (coefficient) on one level is also a linear function of another variable at the second level. I'm envisioning a model like this: :

$$ Y = \alpha+\Beta_1 X_1$$

$$ \alpha \sim N(0,10000) $$

$$ \Y\sim N(\alpha+\Beta_1X_1,\tau) $$

$$ \tau \sim IG(0.0001,0.0001) $$

and the B1 has to be a function to be estimated such as :

$$ \Beta1 \sim N(second-level \alpha + second-level \beta X_2, error) $$

and my code which i tried before is :

#no dummy version
import pymc3 as pm
n_iter = 15000
burn_in = 5000
import pandas as pd

data = pd.read_csv("1021winbugsmodel.csv")
with pm.Model() as model:
    #prior
    
    a= pm.Normal('Intercept',mu=0,sd=1000)

    b2 = pm.Normal('distance',mu=0,sd=1000)
    b4 = pm.Normal('reference_volume:similarity',mu=0,sd=1000)
    b5 = pm.Normal('distance:similarity',mu=0,sd=1000)
    b6 = pm.Normal('density',mu=0,sd=1000)
    b8 = pm.Normal('entropy',mu=0,sd=1000)


    #data
    x1 = pm.Data('x1',data['reference_volume'].values)
    x2 = pm.Data('x2',data['distance'].values)

    x4 = pm.Data('x4',data['reference_volume:similarity'].values)
    x5 = pm.Data('x5',data['distance:similarity'].values)
    x6 = pm.Data('x6',data['density'].values)

    x8 = pm.Data('x8',data['entropy'].values)

    y = pm.Data('y',data['target_volume'].values)
    tau = pm.Gamma('tau',alpha = 0.0001, beta = 0.0001)
    
    #second_level
    second_tau = pm.Gamma('2L_tau',alpha = 0.0001, beta =0.0001)
    second_a = pm.Normal('2L_Intercept', mu=0,sd=1000)
    second_beta = pm.Normal('2L_beta',mu=0,sd=1000)
    
    beta1_est = second_a + second_beta*x2
    
    b1= pm.Normal('reference_volume',mu=beta1_est,sd=second_tau,observed = x1)
    
    #likelihood    
    
    y_est = a+b1*x1+ b4*x4 + b5*x5+ b6*x6 + b8*x8 
    
    likelihood = pm.Normal('target_volume',mu =y_est,tau =tau,observed =y)

    

    
    trace = pm.sample(n_iter,cores =1 ,return_inferencedata = True,idata_kwargs={"log_likelihood": False})

one-level variable(X1) represents reference_volume in my code.

and i want to model so that beta of X1 is estimated as a linear function of different intercept and another variable named 'distance'

hopefully i made myslelf clear...

My question is, does pymc3 support this form of model specification? If so, could you provide guidance or point me to relevant resources?

I genuinely appreciate your help.

Thanks.

I tried two-level estimation like the code i included in question. but didnt work and it gives me this error :

`ERROR (theano.graph.opt): Optimization failure due to: LocalOptGroup(local_useless_fill,local_useless_alloc,local_subtensor_make_vector,local_useless_elemwise,local_useless_inc_subtensor,local_useless_slice,local_subtensor_of_alloc,local_useless_inc_subtensor_alloc,local_useless_rebroadcast,local_join_1,local_join_empty,local_join_make_vector,local_useless_switch,local_useless_tile,local_useless_split,local_useless_reshape,local_useless_elemwise_comparison,local_useless_reduce,local_view_op,local_merge_alloc,local_useless_topk)
ERROR (theano.graph.opt): node: ViewOp(Elemwise{Cast{float64}}.0)

TypeError                                 Traceback (most recent call last)
File ~/miniconda3/lib/python3.9/site-packages/theano/compile/function/types.py:974, in Function.__call__(self, *args, **kwargs)
    972 try:
    973     outputs = (
--> 974         self.fn()
    975         if output_subset is None
    976         else self.fn(output_subset=output_subset)
    977     )
    978 except Exception:

TypeError: expected type_num 5 (NPY_INT32) got 12

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
Cell In[6], line 53
     46 y_est = a+b1*x1+ b4*x4 + b5*x5+ b6*x6 + b8*x8 
     48 likelihood = pm.Normal('target_volume',mu =y_est,tau =tau,observed =y)
---> 53 trace = pm.sample(n_iter,cores =1 ,return_inferencedata = True,idata_kwargs={"log_likelihood": False})
     55 t = trace.sel(draw=slice(burn_in,None))
     56 sumdf = pm.stats.summary(trace,var_names=['Intercept','reference_volume','reference_volume:similarity','distance:similarity','density','entropy','2L_Intercept','2L_beta'])

File ~/miniconda3/lib/python3.9/site-packages/pymc3/sampling.py:428, in sample(draws, step, init, n_init, start, trace, chain_idx, chains, cores, tune, progressbar, model, random_seed, discard_tuned_samples, compute_convergence_checks, callback, jitter_max_retries, return_inferencedata, idata_kwargs, mp_ctx, pickle_backend, **kwargs)
    426 start = deepcopy(start)
    427 if start is None:
--> 428     check_start_vals(model.test_point, model)
    429 else:
    430     if isinstance(start, dict):

File ~/miniconda3/lib/python3.9/site-packages/pymc3/util.py:234, in check_start_vals(start, model)
    228     valid_keys = ", ".join(model.named_vars.keys())
    229     raise KeyError(
    230         "Some start parameters do not appear in the model!\n"
    231         "Valid keys are: {}, but {} was supplied".format(valid_keys, extra_keys)
    232     )
--> 234 initial_eval = model.check_test_point(test_point=elem)
    236 if not np.all(np.isfinite(initial_eval)):
    237     raise SamplingError(
    238         "Initial evaluation of model at starting point failed!\n"
    239         "Starting values:\n{}\n\n"
    240         "Initial evaluation results:\n{}".format(elem, str(initial_eval))
    241     )

File ~/miniconda3/lib/python3.9/site-packages/pymc3/model.py:1384, in Model.check_test_point(self, test_point, round_vals)
   1380 if test_point is None:
   1381     test_point = self.test_point
   1383 return Series(
-> 1384     {RV.name: np.round(RV.logp(test_point), round_vals) for RV in self.basic_RVs},
   1385     name="Log-probability of test_point",
   1386 )

File ~/miniconda3/lib/python3.9/site-packages/pymc3/model.py:1384, in <dictcomp>(.0)
   1380 if test_point is None:
   1381     test_point = self.test_point
   1383 return Series(
-> 1384     {RV.name: np.round(RV.logp(test_point), round_vals) for RV in self.basic_RVs},
   1385     name="Log-probability of test_point",
   1386 )

File ~/miniconda3/lib/python3.9/site-packages/pymc3/model.py:1561, in LoosePointFunc.__call__(self, *args, **kwargs)
   1559 def __call__(self, *args, **kwargs):
   1560     point = Point(model=self.model, *args, **kwargs)
-> 1561     return self.f(**point)

File ~/miniconda3/lib/python3.9/site-packages/theano/compile/function/types.py:987, in Function.__call__(self, *args, **kwargs)
    985     if hasattr(self.fn, "thunks"):
    986         thunk = self.fn.thunks[self.fn.position_of_error]
--> 987     raise_with_op(
    988         self.maker.fgraph,
    989         node=self.fn.nodes[self.fn.position_of_error],
    990         thunk=thunk,
    991         storage_map=getattr(self.fn, "storage_map", None),
    992     )
    993 else:
    994     # old-style linkers raise their own exceptions
    995     raise

File ~/miniconda3/lib/python3.9/site-packages/theano/link/utils.py:508, in raise_with_op(fgraph, node, thunk, exc_info, storage_map)
    505     warnings.warn(f"{exc_type} error does not allow us to add extra error message")
    506     # Some exception need extra parameter in inputs. So forget the
    507     # extra long error message in that case.
--> 508 raise exc_value.with_traceback(exc_trace)

File ~/miniconda3/lib/python3.9/site-packages/theano/compile/function/types.py:974, in Function.__call__(self, *args, **kwargs)
    971 t0_fn = time.time()
    972 try:
    973     outputs = (
--> 974         self.fn()
    975         if output_subset is None
    976         else self.fn(output_subset=output_subset)
    977     )
    978 except Exception:
    979     restore_defaults()

TypeError: expected type_num 5 (NPY_INT32) got 12
Apply node that caused the error: Elemwise{Composite{Switch(i0, (i1 * ((i2 * i3 * sqr((Cast{float64}(i4) - (i5 + (i6 * i7) + (i8 * i9) + (i10 * i11) + (i12 * i13) + (i14 * i15))))) + i16)), i17)}}(Elemwise{Composite{Cast{int8}(GT(inv(sqrt(i0)), i1))}}.0, TensorConstant{(1,) of 0.5}, TensorConstant{(1,) of -1.0}, Elemwise{exp,no_inplace}.0, y, InplaceDimShuffle{x}.0, reference_volume, x1, InplaceDimShuffle{x}.0, x4, InplaceDimShuffle{x}.0, x5, InplaceDimShuffle{x}.0, x6, InplaceDimShuffle{x}.0, x8, Elemwise{Composite{log((i0 * i1))}}.0, TensorConstant{(1,) of -inf})
Toposort index: 11
Inputs types: [TensorType(int8, (True,)), TensorType(float64, (True,)), TensorType(float64, (True,)), TensorType(float64, (True,)), TensorType(int32, vector), TensorType(float64, (True,)), TensorType(int32, vector), TensorType(int32, vector), TensorType(float64, (True,)), TensorType(float64, vector), TensorType(float64, (True,)), TensorType(float64, vector), TensorType(float64, (True,)), TensorType(int32, vector), TensorType(float64, (True,)), TensorType(float64, vector), TensorType(float64, (True,)), TensorType(float32, (True,))]
Inputs shapes: [(1,), (1,), (1,), (1,), (60164,), (1,), (60164,), (60164,), (1,), (60164,), (1,), (60164,), (1,), (60164,), (1,), (60164,), (1,), (1,)]
Inputs strides: [(1,), (8,), (8,), (8,), (4,), (8,), (8,), (4,), (8,), (8,), (8,), (8,), (8,), (4,), (8,), (8,), (8,), (4,)]
Inputs values: [array([1], dtype=int8), array([0.5]), array([-1.]), array([1.]), 'not shown', array([0.]), 'not shown', 'not shown', array([0.]), 'not shown', array([0.]), 'not shown', array([0.]), 'not shown', array([0.]), 'not shown', array([-1.83787707]), array([-inf], dtype=float32)]
Outputs clients: [[Sum{acc_dtype=float64}(Elemwise{Composite{Switch(i0, (i1 * ((i2 * i3 * sqr((Cast{float64}(i4) - (i5 + (i6 * i7) + (i8 * i9) + (i10 * i11) + (i12 * i13) + (i14 * i15))))) + i16)), i17)}}.0)]]

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.
0

There are 0 answers