Uncertainty in Artificial Intelligence
First Name   Last Name   Password   Forgot Password   Log in!
    Proceedings   Proceeding details   Article details         Authors         Search    
Learning Partial Policies to Speedup MDP Tree Search
Jervis Pinto, Alan Fern
Abstract:
A popular approach for online decision making in large MDPs is time-bounded tree search. The effectiveness of tree search, however, is largely influenced by the action branching factor, which limits the search depth given a time bound. An obvious way to reduce action branching is to consider only a subset of potentially good ac- tions at each state as specified by a provided partial policy. In this work, we consider offline learning of such partial policies with the goal of speeding up search without significantly reduc- ing decision-making quality. Our first contribu- tion is to study learning algorithms based on re- ducing our learning problem to i.i.d. supervised learning. We give a reduction-style analysis of three such algorithms, each making different as- sumptions, which relates the supervised learning objectives to the sub-optimality of search using the learned partial policies. Our second contribu- tion is to describe concrete implementations of the algorithms within the popular framework of Monte-Carlo tree search. Finally, the third con- tribution is to evaluate the learning algorithms in two challenging MDPs with large action branch- ing factors, showing that the learned partial poli- cies can significantly improve the anytime per- formance of Monte-Carlo tree search.
Keywords:
Pages: 672-681
PS Link:
PDF Link: /papers/14/p672-pinto.pdf
BibTex:
@INPROCEEDINGS{Pinto14,
AUTHOR = "Jervis Pinto and Alan Fern",
TITLE = "Learning Partial Policies to Speedup MDP Tree Search",
BOOKTITLE = "Proceedings of the Thirtieth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-14)",
PUBLISHER = "AUAI Press",
ADDRESS = "Corvallis, Oregon",
YEAR = "2014",
PAGES = "672--681"
}


hosted by DSL   •   site info   •   help