A Foundation for Reliable Spatial Proteomics Data Analysis
Quantitative mass-spectrometry-based spatial proteomics involves elaborate, expensive, and time-consuming experimental procedures, and considerable effort is invested in the generation of such data. Multiple research groups have described a variety of approaches for establishing high-quality proteome-wide datasets. However, data analysis is as critical as data production for reliable and insightful biological interpretation, and no consistent and robust solutions have been offered to the community so far. Here, we introduce the requirements for rigorous spatial proteomics data analysis, as well as the statistical machine learning methodologies needed to address them, including supervised and semi-supervised machine learning, clustering, and novelty detection. We present freely available software solutions that implement innovative state-of-the-art analysis pipelines and illustrate the use of these tools through several case studies involving multiple organisms, experimental designs, mass spectrometry platforms, and quantitation techniques. We also propose sound analysis strategies for identifying dynamic changes in subcellular localization by comparing and contrasting data describing different biological conditions. We conclude by discussing future needs and developments in spatial proteomics data analysis.