In the PST package one can estimate the prediction quality of individual sequences using the log-loss, e.g:
R> ex2 <- c("a-a-b", "a-b-a-a-b", "b-b-b-b-a")
R> ex2 <- seqdef(ex2)
R> predict(S1.p1, ex2, output = "logloss")
logloss
[1] 0.9183
[2] 0.7311
[3] 0.9600
How do I compare these log-loss values statistically? Is there a way to show that 0.9183 is significantly different from 0.9600?