Corporate Financial Distress Prediction: Based on Multi-source Data and Feature Selection
Abstract
The advent of the era of big data provides new ideas for financial distress prediction. In order to evaluate the financial status of listed companies more accurately, this study establishes a financial distress prediction indicator system based on multi-source data by integrating three data sources: the company's internal management, the external market and online public opinion. This study addresses the redundancy and dimensional explosion problems of multi-source data integration, feature selection of the fused data, and a financial distress prediction model based on maximum relevance and minimum redundancy and support vector machine recursive feature elimination (MRMR-SVM-RFE). To verify the effectiveness of the model, we used back propagation (BP), support vector machine (SVM), and gradient boosted decision tree (GBDT) classification algorithms, and conducted an empirical study on China's listed companies based on different financial distress prediction indicator systems. MRMR-SVM-RFE feature selection can effectively extract information from multi-source fused data. The new feature dataset obtained by selection has higher prediction accuracy than the original data, and the BP classification model is better than linear regression (LR), decision tree (DT), and random forest (RF).
- Publication:
-
arXiv e-prints
- Pub Date:
- April 2024
- DOI:
- 10.48550/arXiv.2404.12610
- arXiv:
- arXiv:2404.12610
- Bibcode:
- 2024arXiv240412610D
- Keywords:
-
- Statistics - Applications