Department Seminar Series

Multi-objective reinforcement learning

30^th June 2015, 13:00 Ashton Lecture Theatre
Prof Ann Nowe
Artificial Intelligence Lab
Vrije Universiteit Brussel
Belgium

Abstract

Many real-world problems involve the optimization of multiple, possibly conflicting objectives. Multi-objective reinforcement learning (MORL) is a generalisation of standard reinforcement learning where the scalar reward signal is extended to multiple feedback signals, in essence, one for each objective. In this talk, I present an overview of our multi-criteria n-armed bandit approaches as well as a novel temporal difference learning algorithm that integrates the Pareto dominance relation into a reinforcement learning approach. This Pareto Q-learning algorithm is a multi-policy algorithm that learns a set of Pareto dominating policies. A key element of the algorithm is the fact that the immediate reward vector is estimated separately from the set of expected future discounted reward vectors. This decomposition allows us to update the sets and to exploit the learned policies consistently throughout the state space.