Applied Machine Learning QSPR Model of Aromatic Hydrocarbon Solute Transport in Sandstone with XGBoost
Abstract
Miscible displacement (MD) tests and their results have commonly been employed to model solute transport in porous media using graphical representations of flow referred to as breakthrough curves (BTCs). Given the recent development of high-performance machine learning (ML) algorithms, this work describes a nonparametric method to adapt a data set obtained from a series of MD tests conducted on a homologous series of hydrophobic organic chemicals, 17 mononuclear aromatic hydrocarbons (MAHs), to train a monotonic-in-time XGBoost model that predicts BTCs. The procedure involves preprocessing, feature selection using quantitative structure property relationships (QSPR) and pruning the initial space of features. These are employed to address excessive dimensionality by using least absolute shrinkage and selection operator (LASSO) and backwards deletion prior to developing the final model. The QSPR model is assessed against simpler models that utilize one QSPR feature, time, and the injection concentration. The resulting models developed using XGBoost, including the QSPR model and the simpler models, are capable of robust solute transport predictions when evaluated using root mean squared error (RMSE) and R2 values. We believe our models can in the future predict MAH BTCs on average with R2 = 0.95, our out-of-sample performance herein. The monotonically increasing prediction BTCs were found to closely approximate the true BTCs.
- Publication:
-
AGU Fall Meeting Abstracts
- Pub Date:
- December 2022
- Bibcode:
- 2022AGUFM.H35H1214B