Are language models rational? The case of coherence norms and belief revision

doi:10.48550/arXiv.2406.03442

Are language models rational? The case of coherence norms and belief revision

Do norms of rationality apply to machine learning models, in particular language models? In this paper we investigate this question by focusing on a special subset of rational norms: coherence norms. We consider both logical coherence norms as well as coherence norms tied to the strength of belief. To make sense of the latter, we introduce the Minimal Assent Connection (MAC) and propose a new account of credence, which captures the strength of belief in language models. This proposal uniformly assigns strength of belief simply on the basis of model internal next token probabilities. We argue that rational norms tied to coherence do apply to some language models, but not to others. This issue is significant since rationality is closely tied to predicting and explaining behavior, and thus it is connected to considerations about AI safety and alignment, as well as understanding model behavior more generally.

Publication:

arXiv e-prints

Pub Date:

June 2024

DOI:

10.48550/arXiv.2406.03442

arXiv:

arXiv:2406.03442

Bibcode:

2024arXiv240603442H

Keywords:

Computer Science - Computation and Language;
Computer Science - Artificial Intelligence

E-Print:

added discussion and cross reference of new empirical work by the authors, updated references, fixed typos

NASA/ADS

Are language models rational? The case of coherence norms and belief revision

Abstract