Economics and Computation Series

Strategically Efficient Exploration for Multi-Agent Reinforcement Learning

1st December 2021, 13:00 add to calender
Robert Loftin
TU Delft

Abstract

As a basis for exploration, the principle of optimism under uncertainty has lead to a number of important theoretical and empirical results in sample efficient reinforcement learning. In this talk, we discuss the role of optimistic exploration in multi-agent reinforcement learning and address potential issues that arise when applying optimism to RL in zero-sum games. We show that the direct application of optimism can lead to highly inefficient exploration in such games, where "cooperative" exploration focuses on outcomes that are unrealistic in "competitive" play. We then introduce a notion of "strategically efficient" exploration and demonstrate theoretically and empirically that strategically efficient learning algorithms can significantly outperform their optimistic counterparts, while retaining the same worst-case sample complexity guarantees.
add to calender (including abstract)

Additional Materials