Nature Research, Nature Methods, 8(9), p. 819-821, 2012
DOI: 10.1038/nmeth.2085
Full text: Download
Detecting genomic structural variants from high-throughput sequencing data is a complex and unresolved challenge. We have developed a statistical learning approach, based on Random Forests, which integrates prior knowledge about the characteristics of structural variants and leads to improved discovery in high throughput sequencing data. The implementation of this technique, forestSV, offers high sensitivity and specificity coupled with the flexibility of a data-driven approach.