I am attempting to build a trivial example of a linear regression for compositional data. I'm using the following code:
from pandas import DataFrame
import numpy as np
from skbio import TreeNode
from gneiss.regression import ols
from IPython.display import display
#define table of compositions
yTrain = DataFrame({'y1': [0.8, 0.3, 0.5], 'y2': [0.2, 0.7, 0.5]})
#define predictors for compositions
xTrain = DataFrame({'x1': [1,3,2]})
#Once these variables are defined, a regression can be performed. These proportions will be converted to balances according to the tree specified. And the regression formula is specified to run temp and ph against the proportions in a single model.
model = ols('x1', yTrain, xTrain)
model.fit()
xTest = DataFrame({'x1': [1,3]})
yTest = model.predict(xTest)
display(yTest)
I'm getting the error matrices are not aligned. Any idea on how to get this running?
It looks like you have mixed up your
xandymatrices between the training and test stages. YourxTestshould perhaps be identical in structure toyTrain. In your codexTestlooks likexTrainwhich seems to correspond to labels.The general convention in ML is to use
xfor inputs andyfor outputs. In your case, you have usedyfor inputs andxfor labels during training, and the other way around during testing.For instance, try setting xTest to the following:
That should get rid of the error. You would ideally do something along the lines of the following: