Learning is planning: near Bayesoptimal reinforcement learning via MonteCarlo tree search
John Asmuth, Michael Littman
Abstract:
Bayesoptimal behavior, while welldefined, is often difficult to achieve. Recent advances in the use of MonteCarlo tree search (MCTS) have shown that it is possible to act nearoptimally in Markov Decision Processes (MDPs) with very large or infinite state spaces. Bayesoptimal behavior in an unknown MDP is equivalent to optimal behavior in the known beliefspace MDP, although the size of this beliefspace MDP grows exponentially with the amount of history retained, and is potentially infinite. We show how an agent can use one particular MCTS algorithm, Forward Search Sparse Sampling (FSSS), in an efficient way to act nearly Bayesoptimally for all but a polynomial number of steps, assuming that FSSS can be used to act efficiently in any possible underlying MDP.
Keywords:
Pages: 1926
PS Link:
PDF Link: /papers/11/p19asmuth.pdf
BibTex:
@INPROCEEDINGS{Asmuth11,
AUTHOR = "John Asmuth
and Michael Littman",
TITLE = "Learning is planning: near Bayesoptimal reinforcement learning via MonteCarlo tree search",
BOOKTITLE = "Proceedings of the TwentySeventh Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI11)",
PUBLISHER = "AUAI Press",
ADDRESS = "Corvallis, Oregon",
YEAR = "2011",
PAGES = "1926"
}

