Supervised topological data analysis for MALDI mass spectrometry imaging applications
Abstract
Background: Matrix-assisted laser desorption/ionization mass spectrometry imaging (MALDI MSI) displays significant potential for applications in cancer research, especially in tumor typing and subtyping. Lung cancer is the primary cause of tumor-related deaths, where the most lethal entities are adenocarcinoma (ADC) and squamous cell carcinoma (SqCC). Distinguishing between these two common subtypes is crucial for therapy decisions and successful patient management. Results: We propose a new algebraic topological framework, which obtains intrinsic information from MALDI data and transforms it to reflect topological persistence. Our framework offers two main advantages. Firstly, topological persistence aids in distinguishing the signal from noise. Secondly, it compresses the MALDI data, saving storage space and optimizes computational time for subsequent classification tasks. We present an algorithm that efficiently implements our topological framework, relying on a single tuning parameter. Afterwards, logistic regression and random forest classifiers are employed on the extracted persistence features, thereby accomplishing an automated tumor (sub-)typing process. To demonstrate the competitiveness of our proposed framework, we conduct experiments on a real-world MALDI dataset using cross-validation. Furthermore, we showcase the effectiveness of the single denoising parameter by evaluating its performance on synthetic MALDI images with varying levels of noise. Conclusion: Our empirical experiments demonstrate that the proposed algebraic topological framework successfully captures and leverages the intrinsic spectral information from MALDI data, leading to competitive results in classifying lung cancer subtypes. Moreover, the frameworks ability to be fine-tuned for denoising highlights its versatility and potential for enhancing data analysis in MALDI applications.
- Publication:
-
arXiv e-prints
- Pub Date:
- February 2023
- DOI:
- arXiv:
- arXiv:2302.13948
- Bibcode:
- 2023arXiv230213948K
- Keywords:
-
- Statistics - Machine Learning;
- Computer Science - Machine Learning;
- Electrical Engineering and Systems Science - Signal Processing
- E-Print:
- 22 pages, 10 figures