Analysis of the tryptic search space in UniProt databases

Alpi, Emanuele; Griss, Johannes; da Silva, Alan Wilter Sousa; Bely, Benoit; Antunes, Ricardo; Zellner, Hermann; Ríos, Daniel; O'Donovan, Claire; Vizcaíno, Juan Antonio; Martin, Maria J.

Published in

Wiley, Proteomics, 1(15), p. 48-57, 2014

DOI: 10.1002/pmic.201400227

Tools

Export citation

Search in Google Scholar

Analysis of the tryptic search space in UniProt databases

Journal article published in 2014 by Emanuele Alpi, Johannes Griss

, Alan Wilter Sousa da Silva, Benoit Bely, Ricardo Antunes, Hermann Zellner, Daniel Ríos, Claire O'Donovan

, Juan Antonio Vizcaíno

, Maria J. Martin

This paper is made freely available by the publisher.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving restricted

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

In this manuscript, we provide a comprehensive study of the content of UniProt protein data sets for human and mouse. The tryptic search spaces of the UniProtKB (UniProt Knowledgebase) complete proteome sets were compared with other data sets from UniProtKB and with the corresponding IPI (International Protein Index), RefSeq, Ensembl and UniRef100 organism-specific data sets. All protein forms annotated in UniProtKB (both the canonical sequences and isoforms) were evaluated in this study. In addition natural and disease associated amino acid variants annotated in UniProtKB were included in the evaluation. The peptide unicity was also evaluated for each data set. Furthermore, the peptide information in the UniProtKB data sets was also compared against the available peptide-level identifications in the main mass spectrometry based proteomics repositories. Identifying the peptides observed in these repositories is an important resource of information for protein databases as they provide supporting evidence for the existence of otherwise predicted proteins. Likewise, the repositories could use the information available in UniProtKB to direct reprocessing efforts on specific sets of peptides/proteins of interest. In summary, we provide comprehensive information about the different organism-specific sequence data sets available from UniProt, together with the pros and cons for each, in terms of search space for mass spectrometry based bottom-up proteomics workflows. The aim of the analysis is to provide a clear view of the tryptic search space of UniProt and other protein databases to enable scientists to select those most appropriate for their purposes.This article is protected by copyright. All rights reserved

Published in

Links

Tools

Analysis of the tryptic search space in UniProt databases

Abstract