Pareto Upper Confidence Bounds algorithms: an empirical study

Drugan, Madalina M.; Nowe, Ann; Manderick, Bernard

Published in

2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)

DOI: 10.1109/adprl.2014.7010620

Tools

Export citation

Search in Google Scholar

Pareto Upper Confidence Bounds algorithms: an empirical study

Proceedings article published in 2014 by Madalina M. Drugan

, Ann Nowe, Bernard Manderick

This paper is available in a repository.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving allowed

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

Many real-world stochastic environments are in-herently multi-objective environments with conflicting objec-tives. The multi-objective multi-armed bandits (MOMAB) are extensions of the classical, i.e. single objective, multi-armed bandits to reward vectors and multi-objective optimisation techniques are often required to design mechanisms with an efficient exploration / exploitation trade-off. In this paper, we propose the improved Pareto Upper Confidence Bound (iPUCB) algorithm that straightforwardly extends the single objective improved UCB algorithm to reward vectors by deleting the suboptimal arms. The goal of the improved Pareto UCB algorithm, i.e. iPUCB, is to identify the set of best arms, or the Pareto front, in a fixed budget of arm pulls. We experimen-tally compare the performance of the proposed Pareto upper confidence bound algorithm with the Pareto UCB1 algorithm and the Hoeffding race on a bi-objective example coming from an industrial control applications, i.e. the engagement of wet clutches. We propose a new regret metric based on the Kullback-Leibler divergence to measure the performance of a multi-objective multi-armed bandit algorithm. We show that iPUCB outperforms the other two tested algorithms on the given multi-objective environment.

Published in

Links

Tools

Pareto Upper Confidence Bounds algorithms: an empirical study

Abstract