I have little background knowledge of Machine Learning, so please forgive me if my question seems silly.
Based on what I've read, the best model-free reinforcement learning algorithm to this date is Q-Learning, where each state,action pair in the agent's world is given a q-value, and at each state the action with the highest q-value is chosen. The q-value is then updated as follows:
Q(s,a) = (1-α)Q(s,a) + α(R(s,a,s') + (max_a' * Q(s',a'))) where α is the learning rate.
Apparently, for problems with high dimensionality, the number of states become astronomically large making q-value table storage infeasible.
So the practical implementation of Q-Learning requires using Q-value approximation via generalization of states aka features. For example if the agent was Pacman then the features would be:
- Distance to closest dot
- Distance to closest ghost
- Is Pacman in a tunnel?
And then instead of q-values for every single state you would only need to only have q-values for every single feature.
So my question is:
Is it possible for a reinforcement learning agent to create or generate additional features?
Some research I've done:
This post mentions A Geramifard's iFDD method
- http://www.icml-2011.org/papers/473_icmlpaper.pdf
- http://people.csail.mit.edu/agf/Files/13RLDM-GQ-iFDD+.pdf
which is a way of "discovering feature dependencies", but I'm not sure if that is feature generation, as the paper assumes that you start off with a set of binary features.
Another paper that I found was apropos is Playing Atari with Deep Reinforcement Learning, which "extracts high level features using a range of neural network architectures".
I've read over the paper but still need to flesh out/fully understand their algorithm. Is this what I'm looking for?
Thanks
It seems like you already answered your own question :)
Feature generation is not part of the Q-learning (and SARSA) algorithm. In a process which is called preprocessing you can however use a wide array of algorithms (of which you showed some) to generate/extract features from your data. Combining different machine learning algorithms results in hybrid architectures, which is a term you might look into when researching what works best for your problem.
Here is an example of using features with SARSA (which is very similar to Q-learning). Whether the papers you cited are helpful for your scenario, you'll have to decide for yourself. As always with machine learning, your approach is highly problem-dependent. If you're in robotics and it's hard to define discrete states manually, a neural network might be helpful. If you can think of heuristics by yourself (like in the pacman example) then you probably won't need it.