American Chemical Society, Journal of Proteome Research, 3(8), p. 1193-1197, 2009
DOI: 10.1021/pr800804d
Full text: Download
The spectrum of problems covered by proteomics studies range from the discovery of compartment specific cell proteomes to clinical applications, including the identification of diagnostic markers and monitoring the effects of drug treatments. In most cases, the ultimate results of a proteomics study are lists of proteins found to be present (or differentially present) at cell physiological conditions under study. Normally, the results are published directly in the article in one or several tables. In many cases, this type of information remains disseminated in hundreds of proteomics publications. We have developed a Web mining tool which allows the collection of this information by searching through full text papers and automatically selecting tables, which report a list of protein identifiers. By searching through major proteomics journals, we have collected approximately 800 independent studies published recently, which reported about 1000 different protein lists. On the basis of this data, we developed a computational tool PLIPS (Protein Lists Identified in Proteomics Studies). PLIPS accepts as input a list of protein/gene identifiers. With the use of statistical analyses, PLIPS infers recently published proteomics studies, which report protein lists that significantly intersect with a query list. PLIPS is a freely available Web-based tool ( http://mips.helmholtz-muenchen.de/proj/plips ).