Links

Tools

Export citation

Search in Google Scholar

Training set of clinical meta-data

This paper is available in a repository.
This paper is available in a repository.

Full text: Download

Question mark in circle
Preprint: policy unknown
Question mark in circle
Postprint: policy unknown
Question mark in circle
Published version: policy unknown

Abstract

This MD Anderson Cancer Center set of anonymized high-quality computed tomography (CT) scans with contrast represent a comparatively homogeneous, uniform cohort of 288 oropharynx cancer patients with detailed clinical history, consistent follow-up of > 2 years, known etiological/biological correlates (specifically, human papilloma virus status). Our major target is to assess/validate the radiomics workflow and predictive capacity of radiomics signatures from challenge participants. We imported the CT scans from the patients’ electronic medical records, that were performed before the initiation of the radiation treatment course. All the patients were treated using the IMRT modality. Some patients were simultaneously prescribed chemotherapy. We intended that the CT films would be as much representative of the original simulation CT scans that were used for treatment planning, in which no contrast was injected according to our institutional policy. Specifically, we posted around one-half of the CT scans from the dataset (138 patients), in DICOM-RT format, on the Kaggle in Class server system, as a “training set”. DICOM-RT files were fully anonymized, with expert physician segmenting primary tumor and lymph node as regions of interest, to eliminate segmentation-related uncertainty for challengers. The primary oropharyngeal tumor was segmented in red. Whereas, the metastatic cervical lymph nodes were segmented individually, rather than on the basis of the nodal level classification system. Both training and test sets include the following data for each DICOM-RT case: age gender race tumor side and subsite T-category N-category AJCC stage Pathologic grade smoking status (in pack-years) Challenge participants will also be able to download a “test" dataset, which includes the remaining randomly selected 150 patients' DICOM files and relevant clinical meta-data, with local control status blinded.Challenge participants will also be able to download a “test" dataset, which includes the remaining randomly selected half of the dataset, with local control status blinded.Challenge participants will also be able to download a “test" dataset, with the remaining random selected half of the dataset, which will have the local recurrence status blinded.