Oxford University Press, Bioinformatics, 2023
DOI: 10.1093/bioinformatics/btad132
Full text: Download
Abstract Motivation Advances in RNA sequencing technologies have achieved an unprecedented accuracy in the quantification of mRNA isoforms, but our knowledge of isoform-specific functions has lagged behind. There is a need to understand the functional consequences of differential splicing, which could be supported by the generation of accurate and comprehensive isoform-specific Gene Ontology (GO) annotations. Results We present Isopret (Isoform Interpretation), a method that uses expectation-maximization to infer isoform specific functions based on the relationship between sequence and functional isoform similarity. We predicted isoform-specific functional annotations for 85,617 isoforms of 17,900 protein-coding human genes spanning a range of 17,430 distinct GO terms. Comparison with a gold-standard corpus of manually annotated human isoform functions showed that isopret significantly outperforms state-of-the-art competing methods. We provide experimental evidence that functionally related isoforms predicted by isopret show a higher degree of domain sharing and expression correlation than functionally related genes. We also show that isoform sequence similarity correlates better with inferred isoform function than with gene level function. Availability and implementation Source code, documentation, and resource files are freely available under a GNU3 license at https://github.com/TheJacksonLaboratory/isopretEM and https://zenodo.org/record/7594321. Supplementary information Supplementary data are available at Bioinformatics online.