Nature Research, Scientific Data, 1(10), 2023
DOI: 10.1038/s41597-023-02108-z
Full text: Download
AbstractDefining cellular and subcellular structures in images, referred to as cell segmentation, is an outstanding obstacle to scalable single-cell analysis of multiplex imaging data. While advances in machine learning-based segmentation have led to potentially robust solutions, such algorithms typically rely on large amounts of example annotations, known as training data. Datasets consisting of annotations which are thoroughly assessed for quality are rarely released to the public. As a result, there is a lack of widely available, annotated data suitable for benchmarking and algorithm development. To address this unmet need, we release 105,774 primarily oncological cellular annotations concentrating on tumor and immune cells using over 40 antibody markers spanning three fluorescent imaging platforms, over a dozen tissue types and across various cellular morphologies. We use readily available annotation techniques to provide a modifiable community data set with the goal of advancing cellular segmentation for the greater imaging community.