Parametric Return Density Estimation for Reinforcement Learning
Tetsuro Morimura, Masashi Sugiyama, Hisashi Kashima, Hirotaka Hachiya, Toshiyuki Tanaka
Most conventional Reinforcement Learning (RL) algorithms aim to optimize decision- making rules in terms of the expected re- turns. However, especially for risk man- agement purposes, other risk-sensitive crite- ria such as the value-at-risk or the expected shortfall are sometimes preferred in real ap- plications. Here, we describe a parametric method for estimating density of the returns, which allows us to handle various criteria in a unified manner. We first extend the Bellman equation for the conditional expected return to cover a conditional probability density of the returns. Then we derive an extension of the TD-learning algorithm for estimating the return densities in an unknown environment. As test instances, several parametric density estimation algorithms are presented for the Gaussian, Laplace, and skewed Laplace dis- tributions. We show that these algorithms lead to risk-sensitive as well as robust RL paradigms through numerical experiments.
PDF Link: /papers/10/p368-morimura.pdf
AUTHOR = "Tetsuro Morimura
and Masashi Sugiyama and Hisashi Kashima and Hirotaka Hachiya and Toshiyuki Tanaka",
TITLE = "Parametric Return Density Estimation for Reinforcement Learning",
BOOKTITLE = "Proceedings of the Twenty-Sixth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-10)",
PUBLISHER = "AUAI Press",
ADDRESS = "Corvallis, Oregon",
YEAR = "2010",
PAGES = "368--375"