In Python, I was attempting to dive into the GPy library for estimating Gaussian Process models, when I encountered a stumbling block early on with simple plotting.
For my data, I generated a simple sine wave with a squared growth rate added in midway, and GPy successfully estimated the initial model.
Data generation:
## Generating data for regression
# First, regular sine wave + normal noise
x = np.linspace(0,40, num=300)
noise1 = np.random.normal(0,0.3,300)
y = np.sin(x) + noise1
# Second, an upward trending starting midway, with its own noise as well
temp = x[150:]
noise2 = 0.004*temp**2 + np.random.normal(0,0.1,150)
y[150:] = y[150:] + noise2
plt.plot(x, y)
Initial model:
## Pre-processing
X = np.expand_dims(x, axis=1)
Y = np.expand_dims(y, axis=1)
## Model
kernel = GPy.kern.RBF(input_dim=1, variance=1., lengthscale=1.)
model1 = GPy.models.GPRegression(X, Y, kernel)
## Plotting
fig = model1.plot()
GPy.plotting.show(fig, filename='basic_gp_regression_notebook')
However, this model is mis-specified, since the data was only created using sin(X) and X^2, and not just X, so I create the next model:
X_all = np.hstack((np.sin(X), np.square(X)))
model2 = GPy.models.GPRegression(X_all, Y, kernel)
fig = model2.plot()
GPy.plotting.show(fig, filename='basic_correct_gp_regression_notebook')
However, now, I am getting plotting errors,
Invalid value of type 'builtins.str' received for the 'size' property of scatter.marker Received value: '5'
I assume this is because the plot does not know to use "X" as the x-axis, having been supplied only sin(X) and X^2.
How could I fix this?