ChronoSense: Exploring Temporal Understanding in Large Language Models with Time Intervals of Events
Abstract
Large Language Models (LLMs) have achieved remarkable success in various NLP tasks, yet they still face significant challenges in reasoning and arithmetic. Temporal reasoning, a critical component of natural language understanding, has raised increasing research attention. However, comprehensive testing of Allen's interval relations (e.g., before, after, during) -- a fundamental framework for temporal relationships -- remains underexplored. To fill this gap, we present ChronoSense, a new benchmark for evaluating LLMs' temporal understanding. It includes 16 tasks, focusing on identifying the Allen relation between two temporal events and temporal arithmetic, using both abstract events and real-world data from Wikidata. We assess the performance of seven recent LLMs using this benchmark and the results indicate that models handle Allen relations, even symmetrical ones, quite differently. Moreover, the findings suggest that the models may rely on memorization to answer time-related questions. Overall, the models' low performance highlights the need for improved temporal understanding in LLMs and ChronoSense offers a robust framework for future research in this area. Our dataset and the source code are available at https://github.com/duyguislakoglu/chronosense.
- Publication:
-
arXiv e-prints
- Pub Date:
- January 2025
- DOI:
- arXiv:
- arXiv:2501.03040
- Bibcode:
- 2025arXiv250103040S
- Keywords:
-
- Computer Science - Machine Learning;
- Computer Science - Computation and Language
- E-Print:
- 14 pages, 2 figures