ChronoSense: Exploring Temporal Understanding in Large Language Models with Time Intervals of Events

doi:10.48550/arXiv.2501.03040

ChronoSense: Exploring Temporal Understanding in Large Language Models with Time Intervals of Events

Large Language Models (LLMs) have achieved remarkable success in various NLP tasks, yet they still face significant challenges in reasoning and arithmetic. Temporal reasoning, a critical component of natural language understanding, has raised increasing research attention. However, comprehensive testing of Allen's interval relations (e.g., before, after, during) -- a fundamental framework for temporal relationships -- remains underexplored. To fill this gap, we present ChronoSense, a new benchmark for evaluating LLMs' temporal understanding. It includes 16 tasks, focusing on identifying the Allen relation between two temporal events and temporal arithmetic, using both abstract events and real-world data from Wikidata. We assess the performance of seven recent LLMs using this benchmark and the results indicate that models handle Allen relations, even symmetrical ones, quite differently. Moreover, the findings suggest that the models may rely on memorization to answer time-related questions. Overall, the models' low performance highlights the need for improved temporal understanding in LLMs and ChronoSense offers a robust framework for future research in this area. Our dataset and the source code are available at https://github.com/duyguislakoglu/chronosense.

Publication:

arXiv e-prints

Pub Date:

January 2025

DOI:

10.48550/arXiv.2501.03040

arXiv:

arXiv:2501.03040

Bibcode:

2025arXiv250103040S

Keywords:

Computer Science - Machine Learning;
Computer Science - Computation and Language

E-Print:

14 pages, 2 figures

ADS

ChronoSense: Exploring Temporal Understanding in Large Language Models with Time Intervals of Events

Abstract