A joint model of unpaired data from scRNA-seq and spatial transcriptomics for imputing missing gene expression measurements
Abstract
Spatial studies of transcriptome provide biologists with gene expression maps of heterogeneous and complex tissues. However, most experimental protocols for spatial transcriptomics suffer from the need to select beforehand a small fraction of genes to be quantified over the entire transcriptome. Standard single-cell RNA sequencing (scRNA-seq) is more prevalent, easier to implement and can in principle capture any gene but cannot recover the spatial location of the cells. In this manuscript, we focus on the problem of imputation of missing genes in spatial transcriptomic data based on (unpaired) standard scRNA-seq data from the same biological tissue. Building upon domain adaptation work, we propose gimVI, a deep generative model for the integration of spatial transcriptomic data and scRNA-seq data that can be used to impute missing genes. After describing our generative model and an inference procedure for it, we compare gimVI to alternative methods from computational biology or domain adaptation on real datasets and outperform Seurat Anchors, Liger and CORAL to impute held-out genes.
- Publication:
-
arXiv e-prints
- Pub Date:
- May 2019
- DOI:
- 10.48550/arXiv.1905.02269
- arXiv:
- arXiv:1905.02269
- Bibcode:
- 2019arXiv190502269L
- Keywords:
-
- Computer Science - Machine Learning;
- Quantitative Biology - Genomics;
- Statistics - Machine Learning
- E-Print:
- submitted to the 2019 ICML Workshop on Computational Biology