Approximate Kalman Filter Q-Learning for Continuous State-Space MDPs
Charles Tripp, Ross Shachter
We seek to learn an effective policy for a Markov Decision Process (MDP) with continuous states via Q-Learning. Given a set of basis functions over state action pairs we search for a corresponding set of linear weights that minimizes the mean Bellman residual. Our algorithm uses a Kalman filter model to estimate those weights and we have developed a simpler approximate Kalman filter model that outperforms the current state of the art projected TD-Learning methods on several standard benchmark problems.
PDF Link: /papers/13/p644-tripp.pdf
AUTHOR = "Charles Tripp
and Ross Shachter",
TITLE = "Approximate Kalman Filter Q-Learning for Continuous State-Space MDPs",
BOOKTITLE = "Proceedings of the Twenty-Ninth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-13)",
PUBLISHER = "AUAI Press",
ADDRESS = "Corvallis, Oregon",
YEAR = "2013",
PAGES = "644--653"