Beyond Numeric Awards: In-Context Dueling Bandits with LLM Agents

doi:10.48550/arXiv.2407.01887

Beyond Numeric Awards: In-Context Dueling Bandits with LLM Agents

In-context decision-making is an important capability of artificial general intelligence, which Large Language Models (LLMs) have effectively demonstrated in various scenarios. However, LLMs often face challenges when dealing with numerical contexts, and limited attention has been paid to evaluating their performance through preference feedback generated by the environment. This paper is the first to investigate the performance of LLMs as decision-makers in the context of Dueling Bandits (DB). We compare GPT-3.5 Turbo, GPT-4, GPT-4 Turbo, Llama 3.1, and o1-preview against eight well-established DB algorithms. Our results reveal that LLMs, particularly GPT-4 Turbo, quickly identify the Condorcet winner, thus outperforming existing state-of-the-art algorithms in terms of weak regret. Nevertheless, LLMs struggle to converge even when explicitly prompted to do so and are sensitive to prompt variations. To overcome these issues, we introduce a hybrid algorithm: LLM-Enhanced Adaptive Dueling (LEAD), which takes advantage of both in-context decision-making capabilities of LLMs and theoretical guarantees inherited from classic DB algorithms. We show that LEAD has theoretical guarantees on both weak and strong regret and validate its robustness even with noisy and adversarial prompts. The design of such an algorithm sheds light on how to enhance trustworthiness for LLMs used in decision-making tasks where performance robustness matters.

Publication:

arXiv e-prints

Pub Date:

July 2024

DOI:

10.48550/arXiv.2407.01887

arXiv:

arXiv:2407.01887

Bibcode:

2024arXiv240701887X

Keywords:

Computer Science - Machine Learning;
Computer Science - Artificial Intelligence;
Computer Science - Computation and Language

NASA/ADS

Beyond Numeric Awards: In-Context Dueling Bandits with LLM Agents

Abstract