The question is as written in the title
There is a 3x3 grid graph at the above image. We can convert it into junction tree. Then it is possible to use message-passing(product-sum algorithm) for the inference(estimating likelihood/posterior etc). So I wonder why the exact inference in the grid graph is so hard?
Is it impossible to find such a junction tree when the grid goes larger?
The short answer: for a nxn grid, the complexity is at least exponential n.
A junction tree is created from the induced graph of the MRF, which depends on the elimination order (which variables you eliminate first to calculate a marginal). The elimination cost is exponential in the size of the largest clique in the induced graph. See this paper for details.
So even though we can use exact inference on the junction tree, the complexity would be exponential in size of the largest clique in the induced graph of the elimination order that was used.
The best possible elimination order will yield a largest clique size equal to the tree width, which is n for a nxn grid. But there are no efficient algorithms for finding it.