Defining a personal, allele-specific, and single-molecule long-read transcriptome
Abstract
RNA molecules of higher eukaryotes can be thousands of nucleotides long and are expressed from two distinct alleles, which can differ by single nucleotide variations (SNVs) in the mature RNA molecule. The de facto standard in RNA biology is short (≤101 bp) read sequencing, which, although very useful, does not cover the entire molecule in a read. We show that using amplification-free long-read sequencing one can often (i) cover the entire molecule, (ii) determine the allele it originated from, and (iii) record its entire exon-intron structure within a single read, thus producing a full-length, allele-specific view of an individual's transcriptome. By enhancing existing gene annotations using long reads and quantifying this enhanced annotation using >100 million 101-bp paired-end reads, we overcome the smaller number of long reads.
- Publication:
-
Proceedings of the National Academy of Science
- Pub Date:
- July 2014
- DOI:
- 10.1073/pnas.1400447111
- Bibcode:
- 2014PNAS..111.9869T