Uncertainty in Artificial Intelligence
First Name   Last Name   Password   Forgot Password   Log in!
    Proceedings         Authors   Author's Info   Article details         Search    
Off-policy TD( l) with a true online equivalence
Hado van Hasselt, Rupam Mahmood, Richard Sutton
Van Seijen and Sutton (2014) recently proposed a new version of the linear TD( ) learning algo- rithm that is exactly equivalent to an online for- ward view and that empirically performed bet- ter than its classical counterpart in both predic- tion and control problems. However, their al- gorithm is restricted to on-policy learning. In the more general case of off-policy learning, in which the policy whose outcome is predicted and the policy used to generate data may be differ- ent, their algorithm cannot be applied. One rea- son for this is that the algorithm bootstraps and thus is subject to instability problems when func- tion approximation is used. A second reason true online TD( ) cannot be used for off-policy learning is that the off-policy case requires so- phisticated importance sampling in its eligibility traces. To address these limitations, we gener- alize their equivalence result and use this gen- eralization to construct the first online algorithm to be exactly equivalent to an off-policy forward view. We show this algorithm, named true on- line GTD( ), empirically outperforms GTD( ) (Maei, 2011) which was derived from the same objective as our forward view but lacks the ex- act online equivalence. In the general theorem that allows us to derive this new algorithm, we encounter a new general eligibility-trace update.
Pages: 330-339
PS Link:
PDF Link: /papers/14/p330-van_hasselt.pdf
@INPROCEEDINGS{van Hasselt14,
AUTHOR = "Hado van Hasselt and Rupam Mahmood and Richard Sutton",
TITLE = "Off-policy TD( l) with a true online equivalence",
BOOKTITLE = "Proceedings of the Thirtieth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-14)",
ADDRESS = "Corvallis, Oregon",
YEAR = "2014",
PAGES = "330--339"

hosted by DSL   •   site info   •   help