Multicollinearity Resolution Based on Machine Learning: A Case Study of Carbon Emissions in Sichuan Province
Abstract
This study preprocessed 2000-2019 energy consumption data for 46 key Sichuan industries using matrix normalization. DBSCAN clustering identified 16 feature classes to objectively group industries. Penalized regression models were then applied for their advantages in overfitting control, high-dimensional data processing, and feature selection - well-suited for the complex energy data. Results showed the second cluster around coal had highest emissions due to production needs. Emissions from gasoline-focused and coke-focused clusters were also significant. Based on this, emission reduction suggestions included clean coal technologies, transportation management, coal-electricity replacement in steel, and industry standardization. The research introduced unsupervised learning to objectively select factors and aimed to explore new emission reduction avenues. In summary, the study identified industry groupings, assessed emissions drivers, and proposed scientific reduction strategies to better inform decision-making using algorithms like DBSCAN and penalized regression models.
- Publication:
-
arXiv e-prints
- Pub Date:
- September 2023
- DOI:
- 10.48550/arXiv.2309.01115
- arXiv:
- arXiv:2309.01115
- Bibcode:
- 2023arXiv230901115Z
- Keywords:
-
- Computer Science - Machine Learning
- E-Print:
- 21 pages,19 figures