µProteInS—a proteogenomics pipeline for finding novel bacterial microproteins encoded by small ORFs

de Souza, Eduardo Vieira; Dalberto, Pedro Ferrari; Machado, Vinicius Pellisoli; Canedo, Adriana; Saghatelian, Alan; Machado, Pablo; Basso, Luiz Augusto; Bizarro, Cristiano Valim

Published in

Oxford University Press, Bioinformatics, 9(38), p. 2612-2614, 2022

DOI: 10.1093/bioinformatics/btac115

Tools

Export citation

Search in Google Scholar

µProteInS—a proteogenomics pipeline for finding novel bacterial microproteins encoded by small ORFs

Journal article published in 2022 by Eduardo Vieira de Souza

, Pedro Ferrari Dalberto

, Vinicius Pellisoli Machado

, Adriana Canedo

, Alan Saghatelian

, Pablo Machado

, Luiz Augusto Basso

, Cristiano Valim Bizarro

This paper is made freely available by the publisher.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving restricted

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

Abstract Summary Genome annotation pipelines traditionally exclude open reading frames (ORFs) shorter than 100 codons to avoid false identifications. However, studies have been showing that these may encode functional microproteins with meaningful biological roles. We developed µProteInS, a proteogenomics pipeline that combines genomics, transcriptomics and proteomics to identify novel microproteins in bacteria. Our pipeline employs a model to filter out low confidence spectra, to avoid the need for manually inspecting Mass Spectrometry data. It also overcomes the shortcomings of traditional approaches that usually exclude overlapping genes, leaderless transcripts and non-conserved sequences, characteristics that are common among small ORFs (smORFs) and hamper their identification. Availability and implementation µProteInS is implemented in Python 3.8 within an Ubuntu 20.04 environment. It is an open-source software distributed under the GNU General Public License v3, available as a command-line tool. It can be downloaded at https://github.com/Eduardo-vsouza/uproteins and either installed from source or executed as a Docker image. Supplementary information Supplementary data are available at Bioinformatics online.

Published in

Links

Tools

µProteInS—a proteogenomics pipeline for finding novel bacterial microproteins encoded by small ORFs

Abstract