Advancing sequential decision-making: efficient querying in clustering and best of both worlds for contextual bandits

Date:

October 10, 2024

Author:

Yuko Kuroki - CENTAI Institute

Abstract

The talk covers two topics:

1) Query-efficient correlation clustering with noisy oracle

We study a general clustering setting in which we have $n$ elements to be clustered, and we aim to perform as few queries as possible to an oracle that returns a noisy sample of the weighted similarity between two elements. Our setting encompasses many application domains in which the similarity function is costly to compute and inherently noisy. We introduce two novel formulations of online learning problems rooted in the paradigm of Pure Exploration in Combinatorial Multi-Armed Bandits (PE-CMAB): fixed confidence and fixed budget settings. For both settings, we design algorithms that combine a sampling strategy with a classic approximation algorithm for correlation clustering and study their theoretical guarantees. Our results are the first examples of polynomial-time algorithms that work for the case of PE-CMAB in which the underlying offline optimization problem is NP-hard.

2) Best-of-both-worlds algorithms for linear contextual bandits

We study best-of-both-worlds algorithms for K-armed linear contextual bandits. Our algorithms achieve nearly optimal regret bounds in both adversarial and stochastic environments, without requiring prior knowledge about the conditions. In the stochastic setting, we attain a strong performance rate that depends on the number of dimensions and the number of actions, adjusted for the smallest gap between the optimal and suboptimal actions. In the adversarial setting, we obtain regret bounds based on either the cumulative loss of the best action or the cumulative second moment of the losses incurred by our algorithm. Additionally, we develop an algorithm based on Follow-The-Regularized-Leader (FTRL) with a Shannon entropy regularizer, which does not require knowledge of the inverse of the covariance matrix. This algorithm achieves a strong regret rate in the stochastic setting while also obtaining near-optimal regret bound in the adversarial environment.