Parametric Return Density Estimation for Reinforcement Learning
Tetsuro Morimura, Masashi Sugiyama, Hisashi Kashima, Hirotaka Hachiya, Toshiyuki Tanaka
Most conventional Reinforcement Learning (RL) algorithms aim to optimize decision- making rules in terms of the expected re- turns. However, especially for risk man- agement purposes, other risk-sensitive crite- ria such as the value-at-risk or the expected shortfall are sometimes preferred in real ap- plications. Here, we describe a parametric method for estimating density of the returns, which allows us to handle various criteria in a unified manner. We first extend the Bellman equation for the conditional expected return to cover a conditional probability density of the returns. Then we derive an extension of the TD-learning algorithm for estimating the return densities in an unknown environment. As test instances, several parametric density estimation algorithms are presented for the Gaussian, Laplace, and skewed Laplace dis- tributions. We show that these algorithms lead to risk-sensitive as well as robust RL paradigms through numerical experiments.
