Mixed-Effects Contextual Bandits

Lee, Kyungbok; Paik, Myunghee Cho; Oh, Min-Hwan; Kim, Gi-Soo

Published in

Proceedings of the AAAI Conference on Artificial Intelligence, 12(38), p. 13409-13417, 2024

DOI: 10.1609/aaai.v38i12.29243

Tools

Export citation

Search in Google Scholar

Mixed-Effects Contextual Bandits

Journal article published in 2024 by Kyungbok Lee

, Myunghee Cho Paik, Min-Hwan Oh, Gi-Soo Kim

This paper was not found in any repository, but could be made available legally by the author.

Full text: Unavailable

Preprint: archiving forbidden

Postprint: archiving allowed

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

We study a novel variant of a contextual bandit problem with multi-dimensional reward feedback formulated as a mixed-effects model, where the correlations between multiple feedback are induced by sharing stochastic coefficients called random effects. We propose a novel algorithm, Mixed-Effects Contextual UCB (ME-CUCB), achieving tildeO(d sqrt(mT)) regret bound after T rounds where d is the dimension of contexts and m is the dimension of outcomes, with either known or unknown covariance structure. This is a tighter regret bound than that of the naive canonical linear bandit algorithm ignoring the correlations among rewards. We prove a lower bound of Omega(d sqrt(mT)) matching the upper bound up to logarithmic factors. To our knowledge, this is the first work providing a regret analysis for mixed-effects models and algorithms involving weighted least-squares estimators. Our theoretical analysis faces a significant technical challenge in that the error terms do not constitute martingales since the weights depend on the rewards. We overcome this challenge by using covering numbers, of theoretical interest in its own right. We provide numerical experiments demonstrating the advantage of our proposed algorithm, supporting the theoretical claims.

Published in

Links

Tools

Mixed-Effects Contextual Bandits

Abstract