Uncertainty in Artificial Intelligence
First Name   Last Name   Password   Forgot Password   Log in!
    Proceedings         Authors   Author's Info   Article details         Search    
REGAL: A Regularization based Algorithm for Reinforcement Learning in Weakly Communicating MDPs
Peter Bartlett, Ambuj Tewari
Abstract:
We provide an algorithm that achieves the optimal regret rate in an unknown weakly communicating Markov Decision Process (MDP). The algorithm proceeds in episodes where, in each episode, it picks a policy using regularization based on the span of the optimal bias vector. For an MDP with S states and A actions whose optimal bias vector has span bounded by H, we show a regret bound of ~O(HSpAT). We also relate the span to various diameter-like quantities associated with the MDP, demonstrating how our results improve on previous regret bounds.
Keywords: null
Pages: 35-42
PS Link:
PDF Link: /papers/09/p35-bartlett.pdf
BibTex:
@INPROCEEDINGS{Bartlett09,
AUTHOR = "Peter Bartlett and Ambuj Tewari",
TITLE = "REGAL: A Regularization based Algorithm for Reinforcement Learning in Weakly Communicating MDPs",
BOOKTITLE = "Proceedings of the Twenty-Fifth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-09)",
PUBLISHER = "AUAI Press",
ADDRESS = "Corvallis, Oregon",
YEAR = "2009",
PAGES = "35--42"
}


hosted by DSL   •   site info   •   help