Knowledge Gradient for Multi-objective Multi-Armed Bandit Algorithms

Drugan, Madalina M.; Yahyaa, Saba Q.; Manderick, Bernard

Tools

Export citation

Search in Google Scholar

Knowledge Gradient for Multi-objective Multi-Armed Bandit Algorithms

Proceedings article published in 2014 by Madalina M. Drugan

, Saba Q. Yahyaa, Bernard Manderick

This paper is available in a repository.

Full text: Download

Preprint: policy unknown

Upload

Postprint: policy unknown

Upload

Published version: policy unknown

Upload

Abstract

We extend knowledge gradient (KG) policy for the multi-objective, multi-armed bandits problem to efficiently explore the Pareto optimal arms. We consider two partial order relationships to order the mean vectors, i.e. Pareto and scalarized functions. Pareto KG finds the optimal arms using Pareto search, while the scalarizations-KG transform the multi-objective arms into one-objective arm to find the optimal arms. To measure the performance of the proposed algorithms, we propose three regret measures. We compare the performance of knowledge gradient policy with UCB1 on a multi-objective multi-armed bandits problem, where KG outperforms UCB1.

Links

Tools

Knowledge Gradient for Multi-objective Multi-Armed Bandit Algorithms

Abstract