Trying to find a best fit line for multiple noisy sine waves

501 views Asked by At

I'm trying to create an average trace line/best fit line for multiple noisy sine waves. This is the code I've generated to create the sine waves:

import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import leastsq



x = np.arange(0,10,.05)
noise = np.random.normal(0,.05,200)
wave = np.sin(x)
y = noise + wave

noise2 = np.random.normal(0,.05,200)
y2 = noise2 + wave

noise3 = np.random.normal(0,.05,200)
y3 = noise3 + wave


plt.plot(x, y)
plt.plot(x, y2)
plt.plot(x, y3)
plt.xlabel('x-axis')
plt.ylabel('y-axis')

plt.show()

The problem I'm having when I search online/this site for advice is that most people are creating best fit lines for a group of data points, not multiple lines.

Any advice would be appreciated, thanks!

This is what I've tried so far:

guess_mean = np.mean(y)
guess_std = 3.0*np.std(y)/(2**.5)
guess_phase = 0

first_guess= guess_std*np.sin(x+guess_phase) + guess_mean

plt.plot(first_guess, label='first guess')

but this isn't working, I think its because the periods are off.

1

There are 1 answers

0
Ami Tavory On BEST ANSWER

There are several ways to do so. Here is what I think is the simplest to implement.

First, let's define a function that takes a tuple (a, b, c), and finds the sum of the fits of a * sin(b * x + c) to each of y, y1, y2.

def f(params):
    y_hat = params[0] * np.sin(params[1] * x + params[2])
    return np.linalg.norm(y_hat - y) + np.linalg.norm(y_hat - y2) + np.linalg.norm(y_hat - y3)

With that, we can call scipy.optimize:

import scipy.optimize as optimize

>> optimize.minimize(f, [1,1,1])
   status: 0
  success: True
     njev: 28
     nfev: 140
 hess_inv: array([[  2.58018841e-03,  -7.55654731e-05,   5.12200054e-04],
       [ -7.55654731e-05,   2.40734497e-04,  -1.23067851e-03],
       [  5.12200054e-04,  -1.23067851e-03,   8.57449579e-03]])
      fun: 2.1219465700417786
        x: array([ 0.99811506, -1.00102866,  3.14393633])
  message: 'Optimization terminated successfully.'
  jac: array([ -1.46031380e-06,  -7.89761543e-06,  -1.75833702e-06])

For a sanity check, try the solution ([ 0.99811506, -1.00102866, 3.14393633]):

>> plot(0.99811506 * sin(-1.00102866 * x + 3.14393633))

enter image description here

Vs. the noisy data:

enter image description here