Learning to Cooperate via Policy Search
Leonid Peshkin, Kee-Eung Kim, Nicolas Meuleau, Leslie Kaelbling
Cooperative games are those in which both agents share the same payoff structure. Value-based reinforcement-learning algorithms, such as variants of Q-learning, have been applied to learning cooperative games, but they only apply when the game state is completely observable to both agents. Policy search methods are a reasonable alternative to value-based methods for partially observable environments. In this paper, we provide a gradient-based distributed policy-search method for cooperative games and compare the notion of local optimum to that of Nash equilibrium. We demonstrate the effectiveness of this method experimentally in a small, partially observable simulated soccer domain.
Keywords: Cooperative games, reinforcement learning, policy search, MDP, POMDP, Nash, soccer
PS Link: http://www.ai.mit.edu/~pesha/Public/UAI00.ps
PDF Link: /papers/00/p489-peshkin.pdf
AUTHOR = "Leonid Peshkin
and Kee-Eung Kim and Nicolas Meuleau and Leslie Kaelbling",
TITLE = "Learning to Cooperate via Policy Search",
BOOKTITLE = "Proceedings of the Sixteenth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-00)",
PUBLISHER = "Morgan Kaufmann",
ADDRESS = "San Francisco, CA",
YEAR = "2000",
PAGES = "489--496"