I've got a MDP problem with the following environment (3x4 map):
with the possible actions Up/Down/Right/Left and a 0.8 chance of moving in the right direction, 0.1 for each adjoining direction (e.g. for Up: 0.1 chance to go Left, 0.1 chance to go Right).
Now what I need to do is calculate the possible results starting in (1,1) running the following sequence of actions:
[Up, Up, Right, Right, Right]
And also calculate the chance of reaching a field (for each possible outcome) with this actions sequence. How can I do this efficiently (so not going through the at least 2^5, max 3^5 possible results)?
Thanks in advance!

Well. I wonder if you are solving the RL problem. We now usually solve the RL problem with Bellman equation and Q-learning.
You will also benefit from this lecture. http://cs229.stanford.edu/notes/cs229-notes12.pdf
And if you have finished learning, repeat the whole process and you will know [up, up, right, right, right]'s probability.
and after learning, the second constraint will be meaningless because it reaches the correct answer almost immediately.
I think this example is in AIMA, right? Actually I have a few questions about the approach. I think it doesn't seem to right my answer if you approach it very theoretically.
and this is the code I simply code with the gym.