I am new to Kalman filters and trying to use it for predicting missing values as well as getting smoothed observation from GPS data (latitude and longitude).
I am using pykalman and my code block looks like this:
data = data[['Lat', 'Lon']]
measurements = np.asarray(data, dtype='float')
measurements_masked = np.ma.masked_invalid(measurements)
# initial state of the form [x0, x0_dot, x1, x1_dot]
initial_state_mean = [
measurements[0, 0],
0,
measurements[0, 1],
0
]
initial_state_covariance = [[ 10, 0, 0, 0],
[ 0, 1, 0, 0],
[ 0, 0, 1, 0],
[ 0, 0, 0, 1]]
# transition matrix to estimate new position given old position
transition_matrix = [
[1, 1, 0, 0],
[0, 1, 0, 0],
[0, 0, 1, 1],
[0, 0, 0, 1]
]
observation_matrix = [
[1, 0, 0, 0],
[0, 0, 1, 0]
]
kf = KalmanFilter(
transition_matrices=transition_matrix,
observation_matrices=observation_matrix,
initial_state_mean=initial_state_mean,
)
filtered_state_means = np.zeros((len(measurements), 4))
filtered_state_covariances = np.zeros((len(measurements), 4, 4))
for i in range(len(measurements)):
if i == 0:
filtered_state_means[i] = initial_state_mean
filtered_state_covariances[i] = initial_state_covariance
else:
filtered_state_means[i], filtered_state_covariances[i] = (
kf.filter_update(
filtered_state_means[i-1],
filtered_state_covariances[i-1],
observation = measurements_masked[i])
)
where data is a pandas dataframe from which latitude and longitude are extracted.
Is this logic correct? Also, what I want to do is to take observations which are closer to missing observation to predict missing values. For example, if, in an array of 10 samples, if 5th, 6th and 7th observations are missing, it makes more sense to predict 5th using 4th sample, predict 7th using 8th sample and predict 6th by taking an average of both 5th and 7th.
Does this approach make sense? If yes, how to do it using pykalman? If not, what can be done to predict missing values more accurately where a lot of consecutive values in an array are absent?
I think a kalman filter is a great fit for what you want. Below is an example with some dummy data where I've masked (hidden) some samples/measurements from the filter. As you can see the KF does a good job of reconstructing the 3 points missing from the middle. The KF will take care of the fact that observations nearer a particular time-stamp are most relevant for estimating that time-stamp (through the assumed dynamics).
This is slightly optimistic as the input data perfectly matches the assumption made in the KF (that objects move with constant velocity). Note that the KF should also work well when velocity is actually changing. I've posted a previous longer answer on
pykalman
library here: https://stackoverflow.com/a/43568887/4988601, which might help in understanding how KF work.Which produces the graph below: