Machine Learning and statistical classification of CRISPR-Cas12a diagnostic assays
Abstract
CRISPR-based diagnostics have gained increasing attention as biosensing tools able to address limitations in contemporary molecular diagnostic tests. To maximise the performance of CRISPR-based assays, much effort has focused on optimizing the chemistry and biology of the biosensing reaction. However, less attention has been paid to improving the techniques used to analyse CRISPR-based diagnostic data. To date, diagnostic decisions typically involve various forms of slope-based classification. Such methods are superior to traditional methods based on assessing absolute signals, but still have limitations. Herein, we establish performance benchmarks (total accuracy, sensitivity, and specificity) using common slope-based methods. We compare the performance of these benchmark methods with three different quadratic empirical distribution function statistical tests, finding significant improvements in diagnostic speed and accuracy when applied to a clinical data set. Two of the three statistical techniques, the Kolmogorov-Smirnov and Anderson-Darling tests, report the lowest time-to-result and highest total test accuracy. Furthermore, we developed a long short-term memory recurrent neural network to classify CRISPR-biosensing data, achieving 100% specificity on our model data set. Finally, we provide guidelines on choosing the classification method and classification method parameters that best suit a diagnostic assays needs.
- Publication:
-
arXiv e-prints
- Pub Date:
- January 2025
- arXiv:
- arXiv:2501.04413
- Bibcode:
- 2025arXiv250104413K
- Keywords:
-
- Quantitative Biology - Quantitative Methods;
- Computer Science - Machine Learning
- E-Print:
- 25 pages, 5 figures, research paper. Nathan Khosla and Jake M. Lesinski contributed equally. Electronic supporting information is included as an appendix