


Kellis, Integrative analysis of 111 reference human epigenomes. Phylogenetic analysis revealed a staggering number of TFBS clusters sharing patterns of presence and absence across primate genomes and enrichment in specific TE families, suggesting that multiple waves of TE insertion spread these TFBSs during primate evolution. Meanwhile, more than 85% of primate-specific TFBSs-representing more than 20% of all TFBSs-are derived from TEs. Using a panel of 69 genome-wide association studies, we found that conserved cCREs and constrained TFBSs achieved high heritability enrichment, demonstrating their utility for functional interpretation of human genetic variants.

We identified ~439 thousand deeply conserved cCREs (47.5% of cCREs and 4% of the human genome) and 2 million TFBSs (0.8% of the human genome) under mammalian constraint. Conserved elements predominate near genes that function in fundamental cellular processes (metabolism, development) and tend to be functional in other mammalian genomes whereas unconstrained elements lie near genes involved in interaction with the environment. We found a spectrum of mammalian conservation for regulatory elements: on one end lies the highly conserved cCREs and constrained TFBSs, and on the other are primate-specific cCREs and TFBSs overlapping transposable elements (TEs). We explored the ENCODE cCREs derived from epigenomic data and the binding sites of 367 TFs from chromatin immunoprecipitation data.
