Why do the trace values have periods of (unwanted) stability?

100 views Asked by At

I have a fairly simple test data set I am trying to fit with pymc3.

The result generated by traceplot looks something like this. Essentially the trace of all parameter look like there is a standard 'caterpillar' for 100 iterations, followed by a flat line for 750 iterations, followed by the caterpillar again.

The initial 100 iterations happen after 25,000 ADVI iterations, and 10,000 tune iterations. If I change these amounts, I randomly will/won't have these periods of unwanted stability.

I'm wondering if anyone has any advice about how I can stop this from happening - and what is causing it?

Thanks.

The full code is below. In brief, I am generating a set of 'phases' (-pi -> pi) with a corresponding set of values y = a(j)*sin(phase) + b(j)*sin(phase). a and b are drawn for each subject j at random, but are related to each other. I then essentially try to fit this same model.

Edit: Here is a similar example, running for 25,000 iterations. Something goes wrong around iteration 20,000.

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import matplotlib.pyplot as plt
import numpy as np
import pymc3 as pm
%matplotlib inline

np.random.seed(0)


n_draw = 2000
n_tune = 10000
n_init = 25000
init_string = 'advi'
target_accept = 0.95

##
# Generate some test data
# Just generates:
# x a vector of phases
# y a vector corresponding to some sinusoidal function of x
# subject_idx a vector corresponding to which subject x is

#9 Subjects
N_j = 9

#Each with 276 measurements
N_i = 276
sigma_y = 1.0
mean = [0.1, 0.1]
cov = [[0.01, 0], [0, 0.01]]  # diagonal covariance

x_sub = np.zeros((N_j,N_i))
y_sub = np.zeros((N_j,N_i))
y_true_sub = np.zeros((N_j,N_i))
ab_sub = np.zeros((N_j,2))
tuning_sub = np.zeros((N_j,1))
sub_ix_sub = np.zeros((N_j,N_i))
for j in range(0,N_j):
    aj,bj = np.random.multivariate_normal(mean, cov)
    #aj = np.abs(aj)
    #bj = np.abs(bj)
    xj = np.random.uniform(-1,1,size = (N_i,1))*np.pi
    xj = np.sort(xj)#for convenience
    yj_true = aj*np.sin(xj) + bj*np.cos(xj)
    yj = yj_true + np.random.normal(scale=sigma_y, size=(N_i,1))

    x_sub[j,:] = xj.ravel()
    y_sub[j,:] = yj.ravel()
    y_true_sub[j,:] = yj_true.ravel()

    ab_sub[j,:] = [aj,bj]
    tuning_sub[j,:] = np.sqrt(((aj**2)+(bj**2)))
    sub_ix_sub[j,:] = [j]*N_i

x = np.ravel(x_sub)
y = np.ravel(y_sub)
subject_idx = np.ravel(sub_ix_sub)
subject_idx = np.asarray(subject_idx, dtype=int)

##
# Fit model
hb1_model = pm.Model()
with hb1_model:
#    Hyperpriors
    hb1_mu_a = pm.Normal('hb1_mu_a', mu=0., sd=100)
    hb1_sigma_a = pm.HalfCauchy('hb1_sigma_a', 4)
    hb1_mu_b = pm.Normal('hb1_mu_b', mu=0., sd=100)
    hb1_sigma_b = pm.HalfCauchy('hb1_sigma_b', 4)

    # We fit a mixture of a sine and cosine with these two coeffieicents
    # allowed to be different for each subject
    hb1_aj = pm.Normal('hb1_aj', mu=hb1_mu_a, sd=hb1_sigma_a, shape = N_j)
    hb1_bj = pm.Normal('hb1_bj', mu=hb1_mu_b, sd=hb1_sigma_b, shape = N_j)

    # Model error
    hb1_eps = pm.HalfCauchy('hb1_eps', 5)

    hb1_linear = hb1_aj[subject_idx]*pm.math.sin(x) + hb1_bj[subject_idx]*pm.math.cos(x)
    hb1_linear_like = pm.Normal('y', mu = hb1_linear, sd=hb1_eps, observed=y)

with hb1_model:
    hb1_trace = pm.sample(draws=n_draw, tune = n_tune,
                      init = init_string, n_init = n_init,
                     target_accept = target_accept)

pm.traceplot(hb1_trace)
2

There are 2 answers

0
altroware On BEST ANSWER

I'd look at the divergencies, as explained in notes and literature on Hamiltonian Monte Carlo, see, e.g., here and here.

with model:
    np.savetxt('diverging.csv', hb1_trace['diverging'])

As a dirty solution, you can try to increase target_accept, perhaps.

Good luck!

0
jrmxn On

To partially answer my own question: After playing with this for a while, it looks like the problem might be due to the hyperprior standard deviation going to 0. I am not sure why the algorithm should get stuck there though (testing a small standard deviation can't be that uncommon...).

In any case, two solutions that seem to alleviate the problem (although they don't remove it entirely) are:

1) Add an offset to the definitions of the standard deviation. e.g.:

offset = 1e-2
hb1_sigma_a = offset + pm.HalfCauchy('hb1_sigma_a', 4)

2) Instead of using a HalfCauchy or HalfNormal for the SD prior, use a logNormal distribution set so that 0 is unlikely.