Package issue trying to plot a Hidden Markov Model

175 views Asked by At

I am trying to fit a Hidden Markov Model to some animal movement data.

However, when multiple rows of my data have the exact same x and y coordinate values (no movement of the animal), the modeling function introduces a 0 for step and NA for angle. The resulting coefficients of the HMM are coherent in terms of biological interpretation, but when it comes to further plotting, I cannot because of these Na's.

To fit the HMM, I am using the moveHMM library in R.

> packageVersion("moveHMM")
[1] ‘1.9’
> sessionInfo()
R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22000)

I have a dataset structured as follows:

> str(data)
'data.frame':   16515 obs. of  3 variables:
 $ ID: num  1 1 1 1 1 1 1 1 1 1 ...
 $ x : num  582629 582629 582628 582628 582628 ...
 $ y : num  -2195359 -2195359 -2195358 -2195358 -2195358 ... 

Then, following the package guidelines (https://cran.r-project.org/web/packages/moveHMM/vignettes/moveHMM-guide.pdf), I transform the data to calculate steps and angles:

# Convert data to moveData object using prepData
move_data <- prepData(data, type = "UTM", coordNames = c("x", "y"))

And here comes the introduction of Na's:

> print(na_count_per_column)
   ID  step angle     x     y 
    0     1  1010     0     0 

The modelling comes next:

# Fit the HMM with the adjusted parameters
hmm_model <- fitHMM(data = move_data, nbStates = 3, stepPar0 = stepPar0, anglePar0 = anglePar0, formula = ~1)

And gives a fair result:

> print(hmm_model)
Value of the maximum log-likelihood: 27526.22 

Step length parameters:
----------------------
               state 1      state 2   state 3
mean      3.426185e-15 8.306167e-02 4.9993906
sd        8.356259e+09 1.131159e-01 1.0002724
zero-mass 1.000000e+00 2.256299e-09 0.4078989

Turning angle parameters:
------------------------
                  state 1      state 2       state 3
mean          0.001312714  0.001319986 -3.313687e-10
concentration 2.827507199 22.348484965  1.000000e+01

Regression coeffs for the transition probabilities:
--------------------------------------------------
             1 -> 2    1 -> 3    2 -> 1    2 -> 3     3 -> 1    3 -> 2
intercept -6.914171 -37.97005 -9.650105 -25.74398 -0.4295815 -1.285255

Transition probability matrix:
-----------------------------
             [,1]        [,2]         [,3]
[1,] 9.990074e-01 0.000992619 3.231352e-17
[2,] 6.441467e-05 0.999935585 6.599375e-12
[3,] 3.376541e-01 0.143501890 5.188441e-01

Initial distribution:
--------------------
[1] 2.674974e-10 1.000000e+00 1.316610e-14

And then the issue arises:

> plot(hmm_model, type = "states")
Decoding states sequence... DONE
Error in if (maxdens > ymax & maxdens < 1.5 * ymax) { : 
  missing value where TRUE/FALSE needed

When trying the method on the elk data provided in the package, the plotting is successful. Even though, there are also some NA's being introduced.

> print(na_count_per_column)
        ID       step      angle          x          y dist_water 
         0          4         10          0          0          0 

Finally, as stated in the package guidelines,

within the package moveHMM,zero-inflation will automatically be included if there are zero steps.

2

There are 2 answers

3
Dan C. Wlodarski On BEST ANSWER

If your data is not too sensitive (and the operation I'm about to suggest isn't too intractible), a little preprocessing might solve this.

A convolutional kernel run over your entire data set could apply a miniscule amount of PRNG Brownian noise/motion. This would prevent those discontinuities of absolutely zero motion.

R does support convolutional kernels.

0
Théo Michelot On

It looks like the problem is caused by state 1, where the step length distribution is estimated to be gamma with mean 3.426185e-15 and standard deviation 8.356259e+09. This happens because you have very many steps of length exactly zero, and so state 1 is basically just used for zero inflation (as indicated by the zero mass parameter equal to 1).

There are a few options to circumvent this problem:

  1. As Dan suggested, you could add a little bit of random jitter to the data to remove the steps of length zero. Given that only very small perturbations are needed, I don't think that the approach matters much, and you could possibly use rnorm, i.e.:
# this assumes that x and y are in km, and that it's
# reasonable to add noise with standard deviation 1 meter
data$x <- data$x + rnorm(nrow(data), 0, 0.001)
data$y <- data$y + rnorm(nrow(data), 0, 0.001)

move_data <- prepData(data, type = "UTM")
  1. The long periods during which the animal is completely still could be removed from the data altogether, to avoid the problem you have, and to save computation time. The idea would be to split a track where there is a long period of no movement, which would require changing the ID column to indicate that some data have been removed. This would be a little more difficult to implement in a general way.

As a side note if anyone else has this problem: this error usually indicates that some parameters take extreme values, and so that there is a problem when trying to compute the state-dependent distributions. In many cases, it is resolved by trying different sets of starting values, or by trying a simpler model formulation; but I don't think this is applicable to your situation.