Published in

Oxford University Press, Bioinformatics, 2023

DOI: 10.1093/bioinformatics/btad132

Links

Tools

Export citation

Search in Google Scholar

An expectation-maximization framework for comprehensive prediction of isoform-specific functions

This paper is made freely available by the publisher.
This paper is made freely available by the publisher.

Full text: Download

Green circle
Preprint: archiving allowed
Orange circle
Postprint: archiving restricted
Red circle
Published version: archiving forbidden
Data provided by SHERPA/RoMEO

Abstract

Abstract Motivation Advances in RNA sequencing technologies have achieved an unprecedented accuracy in the quantification of mRNA isoforms, but our knowledge of isoform-specific functions has lagged behind. There is a need to understand the functional consequences of differential splicing, which could be supported by the generation of accurate and comprehensive isoform-specific Gene Ontology (GO) annotations. Results We present Isopret (Isoform Interpretation), a method that uses expectation-maximization to infer isoform specific functions based on the relationship between sequence and functional isoform similarity. We predicted isoform-specific functional annotations for 85,617 isoforms of 17,900 protein-coding human genes spanning a range of 17,430 distinct GO terms. Comparison with a gold-standard corpus of manually annotated human isoform functions showed that isopret significantly outperforms state-of-the-art competing methods. We provide experimental evidence that functionally related isoforms predicted by isopret show a higher degree of domain sharing and expression correlation than functionally related genes. We also show that isoform sequence similarity correlates better with inferred isoform function than with gene level function. Availability and implementation Source code, documentation, and resource files are freely available under a GNU3 license at https://github.com/TheJacksonLaboratory/isopretEM and https://zenodo.org/record/7594321. Supplementary information Supplementary data are available at Bioinformatics online.