Oxford University Press, Nucleic Acids Research, 7(39), p. 2492-2502, 2010
DOI: 10.1093/nar/gkq1081
Full text: Download
Multicellular organismal development is controlled by a complex network of transcription factors, promoters and enhancers. Although reliable computational and experimental methods exist for enhancer detection, prediction of their target genes remains a major challenge. On the basis of available literature and ChIP-seq and ChIP-chip data for enhanceosome factor p300 and the transcriptional regulator Gli3, we found that genomic proximity and conserved synteny predict target genes with a relatively low recall of 12-27% within 2 Mb intervals centered at the enhancers. Here, we show that functional similarities between enhancer binding proteins and their transcriptional targets and proximity in the protein-protein interactome improve prediction of target genes. We used all four features to train random forest classifiers that predict target genes with a recall of 58% in 2 Mb intervals that may contain dozens of genes, representing a better than two-fold improvement over the performance of prediction based on single features alone. Genome-wide ChIP data is still relatively poorly understood, and it remains difficult to assign biological significance to binding events. Our study represents a first step in integrating various genomic features in order to elucidate the genomic network of long-range regulatory interactions.