In a perfect information environment, where we are able to know the state after an action, like playing chess, is there any reason to use Q learning not TD (temporal difference) learning?
As far as I understand, TD learning will try to learn V(state) value, but Q learning will learn Q(state action value) value, which means Q learning learns slower (as state action combination is more than state only), is that correct?
Q-Learning is a TD (temporal difference) learning method.
I think you are trying to refer to TD(0) vs Q-learning.
I would say it depends on your actions being deterministic or not. Even if you have the transition function, it can be expensive to decide which action to take in TD(0) as you need to calculate the expected value for each of the actions in each step. In Q-learning that would be summarized in the Q-value.