A Knowledge-Informed Large Language Model Framework for U.S. Nuclear Power Plant Shutdown Initiating Event Classification for Probabilistic Risk Assessment
Abstract
Identifying and classifying shutdown initiating events (SDIEs) is critical for developing low power shutdown probabilistic risk assessment for nuclear power plants. Existing computational approaches cannot achieve satisfactory performance due to the challenges of unavailable large, labeled datasets, imbalanced event types, and label noise. To address these challenges, we propose a hybrid pipeline that integrates a knowledge-informed machine learning mode to prescreen non-SDIEs and a large language model (LLM) to classify SDIEs into four types. In the prescreening stage, we proposed a set of 44 SDIE text patterns that consist of the most salient keywords and phrases from six SDIE types. Text vectorization based on the SDIE patterns generates feature vectors that are highly separable by using a simple binary classifier. The second stage builds Bidirectional Encoder Representations from Transformers (BERT)-based LLM, which learns generic English language representations from self-supervised pretraining on a large dataset and adapts to SDIE classification by fine-tuning it on an SDIE dataset. The proposed approaches are evaluated on a dataset with 10,928 events using precision, recall ratio, F1 score, and average accuracy. The results demonstrate that the prescreening stage can exclude more than 97% non-SDIEs, and the LLM achieves an average accuracy of 93.4% for SDIE classification.
- Publication:
-
arXiv e-prints
- Pub Date:
- September 2024
- DOI:
- 10.48550/arXiv.2410.00929
- arXiv:
- arXiv:2410.00929
- Bibcode:
- 2024arXiv241000929X
- Keywords:
-
- Computer Science - Artificial Intelligence;
- Computer Science - Machine Learning