I am trying to fit some random data to a GP with the RBF kernel, using the GPy package. When I change the active dimensions, I get the LinAlgError: not positive definite, even with jitter
error. This error is generated only with a conda environment. When I use pip, I have never run into this error. Has anyone come across this?
import numpy as np
import GPy
import random
def func(x):
return np.sum(np.power(x, 5) - np.power(x, 3))
# 20 random data with 10 dimensions
random.seed(2)
random_sample = [[random.uniform(0,3.4) for i in range(10)] for j in range(20)]
# get the first random sample as an observed data
y = np.array([func(random_sample[0])])
X = np.array([random_sample[0]])
y.shape = (1, 1)
X.shape = (1, 10)
# different set of dimensions
set_dim = [[np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])],
[np.array([0, 1]), np.array([2, 3]), np.array([4, 5]), np.array([6, 7]), np.array([8, 9])],
[np.array([0, 1, 2, 3, 4]), np.array([5, 6, 7, 8, 9])],
[np.array([0, 1, 2, 3]), np.array([4, 5, 6]), np.array([7, 8, 9])]]
for i in range(len(set_dim)):
# new kernel based on active dims
k = GPy.kern.Add([GPy.kern.RBF(input_dim=len(set_dim[i][x]), active_dims=set_dim[i][x]) for x in range(len(set_dim[i]))])
# increase data set with the next random sample
y = np.concatenate((y, np.array([[func(random_sample[i+1])]])))
X = np.concatenate((X, np.array([random_sample[i+1]])))
model = GPy.models.GPRegression(X, y, k)
model.optimize()
Possible Channel-Mixing Issue
Sometimes package builds from across different channels (e.g., anaconda versus conda-forge) are incompatible. The times I've encountered this, it happened when compiled symbols were referenced across packages, and the different build stacks used on the channels used different symbol names, leading to missing symbols when mixing.
I can report that using the exact same package versions as OP, but prioritizing the Conda Forge channel builds, gives me reliable behavior. While not conclusive, this would be consistent with the issue somehow coming from the mixing of the Conda Forge build of
GPy
with otherwise Anaconda builds of dependencies (e.g.,numpy
,scipy
). Specifically suggestive is the fact that I have the exact same GPy build and that module is where the error originates. At the same time, there is nothing in the error that immediately suggests this is a channel mixing issue.Workaround
In practice, I avoid channel mixing issues by always using YAML definitions to create my environments. This is a helpful practice because it encourages one to explicitly state the channel priority as part of the definition and it makes Conda aware of your preference from the outset. The following environment definition works for me:
gpy_cf.yaml
and using
Unless you really do need these exact versions, I would remove whatever versioning constraints are unnecessary (at the very least remove the patches).
Broken Version
For the record, this is the version that I can replicate the error with:
gpy_mixed.yaml
In this case, we force
gpy
to come from Conda Forge and let everything else source from the Anaconda (defaults) channel, similar to the configuration found in OP.