Issues arising from benchmarking single-cell RNA sequencing imputation methods
Abstract
On June 25th, 2018, Huang et al. published a computational method SAVER on Nature Methods for imputing dropout gene expression levels in single cell RNA sequencing (scRNA-seq) data. Huang et al. performed a set of comprehensive benchmarking analyses, including comparison with the data from RNA fluorescence in situ hybridization, to demonstrate that SAVER outperformed two existing scRNA-seq imputation methods, scImpute and MAGIC. However, their computational analyses were based on semi-synthetic data that the authors had generated following the Poisson-Gamma model used in the SAVER method. We have therefore re-examined Huang et al.'s study. We find that the semi-synthetic data have very different properties from those of real scRNA-seq data and that the cell clusters used for benchmarking are inconsistent with the cell types labeled by biologists. We show that a reanalysis based on real scRNA-seq data and grounded on biological knowledge of cell types leads to different results and conclusions from those of Huang et al.
- Publication:
-
arXiv e-prints
- Pub Date:
- August 2019
- DOI:
- arXiv:
- arXiv:1908.07084
- Bibcode:
- 2019arXiv190807084L
- Keywords:
-
- Statistics - Applications;
- Quantitative Biology - Genomics;
- Quantitative Biology - Quantitative Methods
- E-Print:
- 5 pages