Massively parallel in vivo enhancer assay reveals that highly local features determine the cis-regulatory function of ChIP-seq peaks
Transcription factors (TFs) recognize short sequence motifs that are present in millions of copies in large eukaryotic genomes. TFsmust distinguish their target binding sites from a vast genomic excess of spurious motif occurrences; however, it is unclear whether functional sites are distinguished from nonfunctional motifs by local primary sequence features or by the larger genomic context in which motifs reside. We used a massively parallel enhancer assay in living mouse retinas to compare 1,300 sequences bound in the genome by the photoreceptor transcription factor Cone-rod homeobox (Crx), to 3,000 control sequences. We found that very short sequences bound in the genome by Crx activated transcription at high levels, whereas unbound genomic regions with equal numbers of Crx motifs did not activate above background levels, even when liberated from their larger genomic context. High local GC content strongly distinguishes bound motifs from unbound motifs across the entire genome. Our results show that the cis-regulatory potential of TF-bound DNA is determined largely by highly local sequence features and not by genomic context.