Pystan, Runtime error - Initialization failed

2.1k views Asked by At

I'm trying to develop a Bayesian model using Pystan. I'm able to compile the model successfully. But when I'm sampling data I'm getting run time error. Refer to the code below:

my_code = '''
data {
  int N;
  int K1; 
  int K2; 
  real max_intercept;
  matrix[N, K1] X1;
  matrix[N, K2] X2;
  vector[N] y; 
}
parameters {
  vector<lower=0>[K1] beta1;  
  vector[K2] beta2; 
  real<lower=0, upper=max_intercept> alpha; 
  real<lower=0> noise_var; 
}
model {
  beta1 ~ normal(0, 1); 
  beta2 ~ normal(0, 1); 
  noise_var ~ inv_gamma(0.05, 0.05 * 0.01);
  y ~ normal(X1*beta1 + X2*beta2 + alpha, sqrt(noise_var));
}
'''

fit1 = sm1.sampling(data=input_data, iter=2000, chains=4, init=0.5,n_jobs=-1) #Getting an error here

I have checked all the data points (no missing data or no column with same number through out) and their data types (all are float 64). I also scaled the data using MinMaxScaler

input_data = {
    'N': len(data_scaled), #836
    'K1': len(pos_var), #17 
    'K2': len(pos_neg_var),#29 
    'X1': X1, #(836,17)
    'X2': X2, #(836,17)
    'y': data['orders'].values,
    'max_intercept': min(data['orders']) #0
}

Below is the error I'm getting.

RemoteTraceback                           Traceback (most recent call last)
RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\Users\abc\.conda\envs\stan_env\lib\multiprocessing\pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "C:\Users\abc\.conda\envs\stan_env\lib\multiprocessing\pool.py", line 44, in mapstar
    return list(map(*args))
  File "stanfit4anon_model_a396b59aabedfaa132f3a814776a219f_7619586994410633893.pyx", line 371, in stanfit4anon_model_a396b59aabedfaa132f3a814776a219f_7619586994410633893._call_sampler_star
  File "stanfit4anon_model_a396b59aabedfaa132f3a814776a219f_7619586994410633893.pyx", line 404, in stanfit4anon_model_a396b59aabedfaa132f3a814776a219f_7619586994410633893._call_sampler
RuntimeError: Initialization failed.
"""

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
<timed exec> in <module>

~\.conda\envs\stan_env\lib\site-packages\pystan\model.py in sampling(self, data, pars, chains, iter, warmup, thin, seed, init, sample_file, diagnostic_file, verbose, algorithm, control, n_jobs, **kwargs)
    776         call_sampler_args = izip(itertools.repeat(data), args_list, itertools.repeat(pars))
    777         call_sampler_star = self.module._call_sampler_star
--> 778         ret_and_samples = _map_parallel(call_sampler_star, call_sampler_args, n_jobs)
    779         samples = [smpl for _, smpl in ret_and_samples]
    780 

~\.conda\envs\stan_env\lib\site-packages\pystan\model.py in _map_parallel(function, args, n_jobs)
     83         try:
     84             pool = multiprocessing.Pool(processes=n_jobs)
---> 85             map_result = pool.map(function, args)
     86         finally:
     87             pool.close()

~\.conda\envs\stan_env\lib\multiprocessing\pool.py in map(self, func, iterable, chunksize)
    266         in a list that is returned.
    267         '''
--> 268         return self._map_async(func, iterable, mapstar, chunksize).get()
    269 
    270     def starmap(self, func, iterable, chunksize=None):

~\.conda\envs\stan_env\lib\multiprocessing\pool.py in get(self, timeout)
    655             return self._value
    656         else:
--> 657             raise self._value
    658 
    659     def _set(self, i, obj):

RuntimeError: Initialization failed.

I'm relatively new to Pystan. I appreciate any guidance I get here.

1

There are 1 answers

0
Niha K On BEST ANSWER

I fixed the issue! Runtime error generally comes when the data is not meeting the constraints defined in the model.

  1. For instance X values having some -ve numbers when the constraint is X>0 defined in the model.
  2. Also most common mistake, need to make sure Y values are not off. In my data there are few Y values that 0, these values passed missing values and pos value checks. Upon imputing the values with mean of Y the problem is resolved.

Happy learning!