I have some large arrays each with i elements, call them X, Y, Z, for which I need to find some values a, b--where a and b are real numbers between 0 and 1--such that, for the following functions,
r = X - a*Y - b*Z
r_av = Sum(r)/i
rms = Sum((r - r_av)^2), summing over the i pixels
I want to minimize the rms. Basically I'm looking to minimize the scatter in r, and thus need to find the right a and b to do that. So far I have thought to do this in nested loops in one of two ways: either 1)just looping through a range of possible a,b and then selecting out the smallest rms, or 2)inserting a while statement so that the loop will terminate once rms stops decreasing with decreasing a,b for instance. Here's some pseudocode for these:
1) List
for a = 1
for b = 1
calculate m
b = b - .001
a = a - .001
loop 1000 times
sort m values, from smallest
print (a,b) corresponding to smallest m
2) Terminate
for a = 1
for b = 1
calculate m
while m > previous step,
b = b - .001
a = a - .001
Is one of these preferable? Or is there yet another, better way to go about this? Any tips would be greatly appreciated.
There is already a handy formula for least squares fitting.
I came up with two different ways to solve your problem.
For the first one, consider the matrix
K
:In your case,
A
andB
are defined as:Apply the formula to find C that minimizes the error
A * C - B
:Then the result is:
Also, note that numpy already provides linalg.lstsq that does the exact same thing:
A simpler way is to define A as:
Then solve it against
X
to get the coefficients and the mean:If you need a proof of this result, have a look at this post.
Example: