Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods
Gergely Neu, Csaba Szepesvari
In this paper we propose a novel gradient al- gorithm to learn a policy from an expert's observed behavior assuming that the expert behaves optimally with respect to some un- known reward function of a Markovian De- cision Problem. The algorithm's aim is to find a reward function such that the resulting optimal policy matches well the expert's ob- served behavior. The main difficulty is that the mapping from the parameters to poli- cies is both nonsmooth and highly redun- dant. Resorting to subdifferentials solves the first difficulty, while the second one is over- come by computing natural gradients. We tested the proposed method in two artificial domains and found it to be more reliable and efficient than some previous methods.
PDF Link: /papers/07/p295-neu.pdf
AUTHOR = "Gergely Neu
and Csaba Szepesvari",
TITLE = "Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods",
BOOKTITLE = "Proceedings of the Twenty-Third Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-07)",
PUBLISHER = "AUAI Press",
ADDRESS = "Corvallis, Oregon",
YEAR = "2007",
PAGES = "295--302"