I am doing a project involving scientific computing. The following are three variables and their values I got after some experiments.
There is also an equation with three unknowns, a
, b
and c
:
x=(a+0.98)/y+(b+0.7)/z+c
How do I get values of a,b,c
using the above? Is this possible in MATLAB?
This sounds like a regression problem. Assuming that the unexplained errors in measurements are Gaussian distributed, you can find the parameters via least squares. Basically, you'd have to rewrite the equation so that you get this to the form of
ma + nb + oc = p
and then you have 6 equations with 3 unknowns (a, b, c
) and these parameters can be found through optimization by least squares. Therefore, with some algebra, we get:As such,
m = z, n = y, o = yz, p = xyz - 0.98z - 0.7z
. I'll leave that for you as an exercise to verify that my algebra is right. You can then form the matrix equation:We would have 6 equations and we want to solve for
x
wherex = [a b c]^{T}
. To solve forx
, you can employ what is known as the pseudoinverse to retrieve the parameters that best minimize the error between the true output and the output that is generated by these parameters if you were to use the same input data.In other words:
A^{+}
is the pseudoinverse of the matrixA
and is matrix-vector multiplied with the vectord
.To put our thoughts into code, we would define our input data, form the
A
matrix andd
vector where each row shared between them both is one equation, and then employ the pseudoinverse to find our parameters. You can use theldivide (\)
operator to do the job:params
stores the parametersa
,b
andc
, and we get:If you want to double-check how close the values are, you can simply use the above expression in your post and compare with each of the values in
x
:You can see that it's not exactly close, but the parameters you got would be the best in a least-squares error sense.
Bonus - Fitting with RANSAC
You can see that some of the predicted values (right column in the output) are more off than others. This is because we used all points in your data to find the appropriate model. One technique that is used to minimize error and increase the robustness of the model estimation is to use something called RANSAC, or RANdom SAmple Consensus. The basic methodology behind RANSAC is that for a certain number of iterations, you take your data and randomly sample the least amount of points necessary to find a model. Once you find this model, you find the overall error if you were to use these parameters to describe your data. You keep randomly choosing points, finding your model, and finding the error and the iteration that produced the least amount of error would be the parameters you keep to define the overall model.
As you can see above, one error that we can define is the sum of absolute differences between the true
x
points and the predictedx
points. There are many other measures, such as the sum of squared errors, but let's stick with something simple for now. If you take a look at the above formulation, we need a minimum of three equations in order to definea
,b
andc
, and so for each iteration, we'd randomly select three points without replacement I might add, find our model, determine the error, and keep iterating and finding the parameters with the least amount of error.Therefore, you could write a RANSAC algorithm like so:
When I run the above code, I get for my parameters:
Comparing this with our
x
, we get:As you can see, the values are improved - especially the fourth and sixth points... and compare it to the previous version:
You can see that the second value is worse off than the previous version, but the other numbers are much more closer to the true values.
Have fun!