Issues arising from benchmarking single-cell RNA sequencing imputation methods

doi:10.48550/arXiv.1908.07084

Issues arising from benchmarking single-cell RNA sequencing imputation methods

On June 25th, 2018, Huang et al. published a computational method SAVER on Nature Methods for imputing dropout gene expression levels in single cell RNA sequencing (scRNA-seq) data. Huang et al. performed a set of comprehensive benchmarking analyses, including comparison with the data from RNA fluorescence in situ hybridization, to demonstrate that SAVER outperformed two existing scRNA-seq imputation methods, scImpute and MAGIC. However, their computational analyses were based on semi-synthetic data that the authors had generated following the Poisson-Gamma model used in the SAVER method. We have therefore re-examined Huang et al.'s study. We find that the semi-synthetic data have very different properties from those of real scRNA-seq data and that the cell clusters used for benchmarking are inconsistent with the cell types labeled by biologists. We show that a reanalysis based on real scRNA-seq data and grounded on biological knowledge of cell types leads to different results and conclusions from those of Huang et al.

Publication:

arXiv e-prints

Pub Date:

August 2019

DOI:

10.48550/arXiv.1908.07084

arXiv:

arXiv:1908.07084

Bibcode:

2019arXiv190807084L

Keywords:

Statistics - Applications;
Quantitative Biology - Genomics;
Quantitative Biology - Quantitative Methods

E-Print:

5 pages

ADS

Issues arising from benchmarking single-cell RNA sequencing imputation methods

Abstract