HI-TOM: A Benchmark for Evaluating Higher-Order Theory of Mind Reasoning in Large Language Models

doi:10.48550/arXiv.2310.16755

HI-TOM: A Benchmark for Evaluating Higher-Order Theory of Mind Reasoning in Large Language Models

Theory of Mind (ToM) is the ability to reason about one's own and others' mental states. ToM plays a critical role in the development of intelligence, language understanding, and cognitive processes. While previous work has primarily focused on first and second-order ToM, we explore higher-order ToM, which involves recursive reasoning on others' beliefs. We introduce HI-TOM, a Higher Order Theory of Mind benchmark. Our experimental evaluation using various Large Language Models (LLMs) indicates a decline in performance on higher-order ToM tasks, demonstrating the limitations of current LLMs. We conduct a thorough analysis of different failure cases of LLMs, and share our thoughts on the implications of our findings on the future of NLP.

Publication:

arXiv e-prints

Pub Date:

October 2023

DOI:

10.48550/arXiv.2310.16755

arXiv:

arXiv:2310.16755

Bibcode:

2023arXiv231016755H

Keywords:

Computer Science - Computation and Language;
Computer Science - Artificial Intelligence

E-Print:

Accepted at Findings of EMNLP 2023

NASA/ADS

HI-TOM: A Benchmark for Evaluating Higher-Order Theory of Mind Reasoning in Large Language Models

Abstract