How to build Markov Decision Processes model in Python for string data?

277 views Asked by At

I have a dataset containing data which are represented via URI. I'd like to model the data that can predict the predecessor and successor of a data sample from my sequential data. Dataset looks like this: Sequential Dataset

e.g. given "HTTP://example.com/112", the model generates "HTTP://example.com/296" as predecessor and "HTTP://example.com/322" as successor. I'd like to build a Markov Decision Process model for this dataset to get the aforementioned result. That would be great if anyone can help me find a suitable package for Python. I checked "hmmlearn" package with which I can implement a hidden Markov model. But my data doesn't have hidden states. Also, I'm not sure if I should convert these data to numerical data and then I am able to build a Markov model.

Thank you in advance!

1

There are 1 answers

0
Viktoriya Malyasova On

If there are no hidden states, you have a Markov Chain. They are not hard to implement on your own, but if you want a library, there is pomegranate:

from pomegranate import MarkovChain
#say you have two sequences of clicks:
sequences = [['uri1', 'uri5', 'uri3', 'uri5'], ['uri2', 'uri3', 'uri1', 'uri2']]
model = MarkovChain.from_samples(sequences)

Learned transition probabilities:

print(model.distributions[1])
uri5    uri5    0.0
uri5    uri3    1.0
uri5    uri1    0.0
uri5    uri2    0.0
uri3    uri5    0.5
uri3    uri3    0.0
uri3    uri1    0.5
uri3    uri2    0.0
uri1    uri5    0.5
uri1    uri3    0.0
uri1    uri1    0.0
uri1    uri2    0.5
uri2    uri5    0.0
uri2    uri3    1.0
uri2    uri1    0.0
uri2    uri2    0.0