LinAlgError: not positive definite, even with jitter. When using a conda environment instead of pip

1k views Asked by At

I am trying to fit some random data to a GP with the RBF kernel, using the GPy package. When I change the active dimensions, I get the LinAlgError: not positive definite, even with jitter error. This error is generated only with a conda environment. When I use pip, I have never run into this error. Has anyone come across this?

import numpy as np
import GPy
import random

def func(x):
      return np.sum(np.power(x, 5) - np.power(x, 3))
    
# 20 random data with 10 dimensions
random.seed(2)
random_sample = [[random.uniform(0,3.4) for i in range(10)] for j in range(20)]

# get the first random sample as an observed data 
y = np.array([func(random_sample[0])])
X = np.array([random_sample[0]])
y.shape = (1, 1)
X.shape = (1, 10)

# different set of dimensions
set_dim = [[np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])],
           [np.array([0, 1]), np.array([2, 3]), np.array([4, 5]), np.array([6, 7]), np.array([8, 9])],
           [np.array([0, 1, 2, 3, 4]), np.array([5, 6, 7, 8, 9])],
           [np.array([0, 1, 2, 3]), np.array([4, 5, 6]), np.array([7, 8, 9])]]


for i in range(len(set_dim)):
    # new kernel based on active dims
    k = GPy.kern.Add([GPy.kern.RBF(input_dim=len(set_dim[i][x]), active_dims=set_dim[i][x]) for x in range(len(set_dim[i]))])
    
    # increase data set with the next random sample
    y = np.concatenate((y, np.array([[func(random_sample[i+1])]])))
    X = np.concatenate((X, np.array([random_sample[i+1]])))

    model = GPy.models.GPRegression(X, y, k)
    model.optimize()

The output of conda list for gpy, scipy and numpy. Output of conda list for gpy, scipy and numpy

The paths of the above packages. paths

1

There are 1 answers

6
merv On

Possible Channel-Mixing Issue

Sometimes package builds from across different channels (e.g., anaconda versus conda-forge) are incompatible. The times I've encountered this, it happened when compiled symbols were referenced across packages, and the different build stacks used on the channels used different symbol names, leading to missing symbols when mixing.

I can report that using the exact same package versions as OP, but prioritizing the Conda Forge channel builds, gives me reliable behavior. While not conclusive, this would be consistent with the issue somehow coming from the mixing of the Conda Forge build of GPy with otherwise Anaconda builds of dependencies (e.g., numpy, scipy). Specifically suggestive is the fact that I have the exact same GPy build and that module is where the error originates. At the same time, there is nothing in the error that immediately suggests this is a channel mixing issue.

Workaround

In practice, I avoid channel mixing issues by always using YAML definitions to create my environments. This is a helpful practice because it encourages one to explicitly state the channel priority as part of the definition and it makes Conda aware of your preference from the outset. The following environment definition works for me:

gpy_cf.yaml

name: gpy_cf
channels:
  - conda-forge
  - defaults
dependencies:
  - python=3.6
  - gpy=1.9.6
  - numpy=1.16.2
  - scipy=1.2.1

and using

conda env create -f gpy_cf.yaml
conda activate gpy_cf

Unless you really do need these exact versions, I would remove whatever versioning constraints are unnecessary (at the very least remove the patches).


Broken Version

For the record, this is the version that I can replicate the error with:

gpy_mixed.yaml

name: gpy_mixed
channels:
  - defaults
  - conda-forge
dependencies:
  - python=3.6
  - conda-forge::gpy=1.9.6
  - numpy=1.16.2
  - scipy=1.2.1

In this case, we force gpy to come from Conda Forge and let everything else source from the Anaconda (defaults) channel, similar to the configuration found in OP.