FDG-PET-CT-Lesions
Description
Introduction
A publicly available dataset of annotated Positron Emission Tomography/Computed Tomography (PET/CT) studies. 1014 whole body Fluorodeoxyglucose (FDG)-PET/CT datasets (501 studies of patients with malignant lymphoma, melanoma and non small cell lung cancer (NSCLC) and 513 studies without PET-positive malignant lesions (negative controls)) acquired between 2014 and 2018 were included. All examinations were acquired on a single, state-of-the-art PET/CT scanner. The imaging protocol consisted of a whole-body FDG-PET acquisition and a corresponding diagnostic CT scan. All FDG-avid lesions identified as malignant based on the clinical PET/CT report were manually segmented on PET images in a slice-per-slice (3D) manner. We provide the anonymized NIfTI files of all studies as well as the corresponding NIfTI segmentation masks. In addition, we provide scripts for image processing and conversion to different file formats (DICOM, mha, hdf5). Primary diagnosis, age and sex are provided as non-imaging information.
Structure and usage
The data is organized in the following structure:
|--- Patient 1
|--- Study 1
|--- SUV.nii.gz (PET image in SUV)
|--- CTres.nii.gz (CT image resampled to PET)
|--- CT.nii.gz (Original CT image)
|--- SEG.nii.gz (Manual annotations of tumor lesions)
|--- PET.nii.gz (Original PET image as activity counts)
|--- Study 2 (Potential 2nd visit of same patient)
|--- ...
|--- Patient 2
|--- ...|--- fdg_metadata.csv (metadata csv for studies)
We demonstrate how this dataset can be used for deep learning-based automated analysis of PET/CT data and provide the trained deep learning model: www.autopet.org
PET/CT acquisition protocol
Patients fasted at least 6 h prior to the injection of approximately 350 MBq 18F-FDG. Whole-body PET/CT images were acquired using a Biograph mCT PET/CT scanner (Siemens, Healthcare GmbH, Erlangen, Germany) and were initiated approximately 60 min after intravenous tracer administration. Diagnostic CT scans of the neck, thorax, abdomen and pelvis (200 reference mAs; 120 kV) were acquired 90 sec after intravenous injection of a contrast agent (90–120 ml Ultravist 370, Bayer AG). PET Images were reconstructed iteratively (three iterations, 21 subsets) with Gaussian post-reconstruction smoothing (2 mm full width at half-maximum). Slice thickness on contrast-enhanced CT was 2 or 3 mm.
Annotation
Two experts annotated training and test data: At the University Hospital Tübingen, a Radiologist with 10 years of experience in Hybrid Imaging and experience in machine learning research annotated all data. At the University Hospital of the LMU in Munich, a Radiologist with 5 years of of experience in Hybrid Imaging and experience in machine learning research annotated all data.
The following annotation protocol was defined:
Step 1: Identification of FDG-avid tumor lesions by visual assessment of PET and CT information together with the clinical examination reports.
Step 2: Manual free-hand segmentation of identified lesions in axial slices.
Files
Additional details
- Accuracy
The dataset comprises 1,014 whole-body FDG-PET/CT scans, with 501 studies from patients diagnosed with malignant lymphoma, melanoma, or non-small cell lung cancer (NSCLC), and 513 studies without PET-positive malignant lesions (serving as negative controls). All FDG-avid lesions identified as malignant were manually segmented on PET images in a slice-by-slice manner by a single reader using dedicated software, ensuring precise representation of the true values of the intended attributes in the specific context of use.
- Completeness
Each dataset includes the NIfTI files converted from the original DICOM files of the PET/CT scans along with corresponding segmentation masks. Non-imaging information such as primary diagnosis, age, and sex are also provided, ensuring that all expected attributes and related entity instances are present for each subject. The conversion scripts can be found on Github.
- Conformity
The data adheres to established standards and conventions in medical imaging. The PET images characterize tumoral glucose metabolism, while the CT scans provide complementary anatomical localization of the tumor, following standard imaging protocols.
- Consistency
All examinations were acquired on a single, state-of-the-art PET/CT scanner (Siemens Biograph mCT) between 2014 and 2018 at the University Hospital Tübingen. The imaging protocol consisted of a diagnostic CT scan (mainly from skull base to mid-thigh level) with intravenous contrast enhancement in most cases, except for patients with contraindications. The following CT parameters were used: reference dose of 200 mAs, tube voltage of 120 kV, iterative reconstruction with a slice thickness of 2–3 mm. In addition, a whole-body FDG-PET scan was acquired 60 minutes after intravenous injection of 300–350 MBq 18F-FDG. PET data were reconstructed using attenuation correction with a slice thickness of 5 mm. This uniformity in data acquisition protocols ensures that the data is free from contradictions and coherent across the dataset.
- Credibility
The dataset has been published in a peer-reviewed journal (Nature Scientific Data) and the DICOM files are hosted by The Cancer Imaging Archive (TCIA). Datasets are in use by the autoPET machine learning challenge series.
- Processability
The dataset is provided in NIfTI format, which is widely supported by medical imaging software and can be readily processed by automated systems. Additionally, scripts for image processing and conversion to different file formats (DICOM, mha, hdf5) are provided, enhancing machine interpretability and handling by automated processes.
- Relevance
The dataset is highly relevant for developing and training machine learning methods for automated analysis of PET/CT data, addressing the limited availability of publicly available high-quality training data for PET/CT image analysis projects.
- Timeliness
The data was collected between 2014 and 2018, providing relatively recent imaging data that reflects current clinical practices.
- Understandability
Comprehensive documentation accompanies the dataset, detailing the data acquisition process, segmentation methodology, and potential applications. This facilitates users' ability to read, interpret, and utilize the data effectively.
- Data File format
- zip
- Data source type
- medical/clinical registers/records/accounts
- General data format
- still image
- Code repository
- https://github.com/lab-midas/TCIA_processing/
- Copyright holder
- University Hospital Tübingen
- Copyright year
- 2022
- Is accessible for free
- Yes