Longitudinal-CT
Description
Introduction
A publicly available dataset of annotated longitudinal Computed Tomography (CT) studies. The dataset comprises whole-body CT scans from 300 melanoma patients undergoing longitudinal imaging for therapy response assessment. Each patient has two imaging timepoints: a baseline staging scan and a follow-up scan acquired after therapy treatment. The dataset includes training data from a single site (UKT).
All CT examinations were acquired on state-of-the-art CT scanners using standardized protocols following international guidelines. The imaging protocol includes whole-body CT imaging, typically extending from the skull base to mid-thigh level, with possible extensions to include the entire body when clinically relevant (all data is defaced). The dataset provides anonymized NIfTI files of all CT scans along with manually annotated segmentation masks of malignant tumors, including primary tumors and metastases. The lesion center of gravity is provided for each individual lesion in the volume (baseline and follow-up scans). The tumors can change shape (progression or regression), split or merge, disappear (complete response) or newly appear (metastasis). Additionally, scripts for image processing and conversion to different file formats (DICOM, mha, hdf5) are available.
The dataset is designed to facilitate the development and evaluation of AI-based lesion detection and segmentation algorithms in longitudinal CT imaging for oncology applications. The inclusion of multiple imaging timepoints allows for the assessment of lesion progression and therapy response, providing a clinically realistic dataset for algorithm training and validation.
Structure and usage
Filenames start with a unique patient ID (10 digits). The data is organized in the following structure:
|--- inputsTr
|--- c6f057b865.csv (lesion information for patient)
|--- c6f057b865_BL_00.json (lesion center of gravity per lesion in baseline CT; Grand-Challenge JSON format)
|--- c6f057b865_BL_img_BL_img_00.nii.gz (CT baseline image)
|--- c6f057b865_BL_mask_BL_img_00.nii.gz (CT baseline lesion mask, integer mask)
|--- c6f057b865_FU_00.json (lesion center of gravity per lesion in first follow-up CT; Grand-Challenge JSON format)
|--- c6f057b865_FU_01.json (lesion center of gravity per lesion in second follow-up CT; Grand-Challenge JSON format; if available)
|--- c6f057b865_FU_img_FU_img_00.nii.gz (CT follow-up image, first body region)
|--- c6f057b865_FU_img_FU_img_01.nii.gz (CT follow-up image, second body region; if available)
|--- ...
|--- targetsTr
|--- c6f057b865_FU_mask_FU_img_00.nii.gz (CT follow-up lesion mask of first body region, integer mask)
|--- c6f057b865_FU_mask_FU_img_01.nii.gz (CT follow-up lesion mask of second body region, integer mask; if available)
|--- ...
CSV file
The CSV file contains the following columns:
- lesion_id: Continous ID count in the respective patient
- cog_bl: Lesion center of gravity in baseline image as 3D pixel coordinates
- img_id_bl: baseline image ID (either 0 or 1)
- cog_propagated: Lesion center of gravity (as 3D pixel coordinates) propagated from baseline to follow-up scan using a conventional registration (not available for all lesions)
- cog_fu: Lesion center of gravitiy in follow-up image as 3D pixel coordinates
- img_id_fu: follow-up image ID (either 0 or 1)
- lesion_type: Anatomical lesion location
We demonstrate how this dataset can be used for deep learning-based automated analysis of CT data and provide the trained deep learning model: www.autopet.org
CT acquisition protocol
All CT scans were acquired using Siemens CT scanners, including Siemens Sensation 64, Siemens SOMATOM Definition AS, Siemens SOMATOM Definition Flash, Siemens SOMATOM Force, and the Siemens Biograph128 PET/CT scanner. Patients were scanned using an in-house whole-body staging protocol in the supine position with arms raised above the head. The scanning procedure was performed during the portal-venous phase after the administration of body-weight-adapted contrast medium via the cubital vein.
To ensure consistent image quality, attenuation-based tube current modulation (CARE Dose, reference mAs 240) and a fixed tube voltage of 120 kV were applied. The following scan parameters were used across different CT scanners:
SOMATOM Force: Collimation 128 × 0.6 mm, rotation time 0.5 s, pitch 0.6.
Sensation64: Collimation 64 × 0.6 mm, rotation time 0.5 s, pitch 0.6.
SOMATOM Definition Flash: Collimation 128 × 0.6 mm, rotation time 0.5 s, pitch 1.0.
SOMATOM Definition AS: Collimation 64 × 0.6 mm, rotation time 0.5 s, pitch 0.6.
Biograph128: Collimation 128 × 0.6 mm, rotation time 0.5 s, pitch 0.8.
Slice thickness and increment were set to 3 mm, and image reconstruction was performed using a medium smooth kernel.
Annotation
All data were manually annotated by two experienced radiologists. To this end, tumor lesions were manually segmented on the CT image data using dedicated software.
The following annotation protocol was defined:
Step 1: Identification of tumor lesions by visual assessment of CT information together with the clinical examination reports.
Step 2: Manual free-hand segmentation of identified lesions in axial slices.
Step 3: Baseline and follow-up segmentations are viewed side-by-side to mark the matching lesions.
Files
Additional details
- EXC 2064: Machine Learning: New Perspectives for Science 2064/1–390727645
- German Research Foundation (DFG)
- EXC 2180: Image-Guided and Functionally Instructed Tumor Therapies (iFIT) 2180/1-390900677
- German Research Foundation (DFG)
- A whole body Radiomics approach in patients with metastatic melanoma undergoing systemic therapy: Fully automated longitudinal segmentation and Deep Learning-based outcome prediction 428216905
- German Research Foundation (DFG)
- Accuracy
The dataset consists of high-quality CT scans acquired using state-of-the-art scanners following standardized imaging protocols. Lesions are manually segmented, ensuring a high degree of accuracy in tumor delineation.
- Completeness
The dataset includes two imaging timepoints (baseline and follow-up) for each of the 333 patients. All expected attributes, such as imaging data, segmentation masks, and basic patient demographics (age, sex), are provided, ensuring completeness.
- Conformity
Each dataset includes the NIfTI files converted from the original DICOM files of the CT scans along with corresponding segmentation masks. Non-imaging information such as primary diagnosis, age, and sex are also provided, ensuring that all expected attributes and related entity instances are present for each subject. The conversion scripts can be found on Github.
- Consistency
All imaging data were acquired with standardized scanner settings and protocols across multiple sites. The dataset maintains coherence in image acquisition, resolution, and reconstruction parameters, ensuring consistency.
- Credibility
The dataset originates from a well-established medical institution (UKT) and follows rigorous data acquisition and annotation protocols, making it credible for research and clinical applications.
- Processability
The data is provided in NIfTI format, which is widely supported by medical imaging software and machine learning frameworks. Scripts for data conversion (DICOM, mha, hdf5) further enhance processability.
- Relevance
The dataset is highly relevant for AI-based lesion detection, segmentation, and therapy response assessment in oncology. The inclusion of longitudinal imaging makes it valuable for research on tumor progression.
- Timeliness
The dataset comprises imaging data collected from melanoma patients undergoing therapy, ensuring the data remains clinically relevant.
- Understandability
The dataset follows standard medical imaging conventions, and documentation is provided for imaging protocols and acquisition details, making it understandable for researchers and clinicians.
- Data File format
- zip
- Data source type
- medical/clinical registers/records/accounts
- General data format
- still image
- Code repository
- https://github.com/lab-midas/autoPETCTIV
- Copyright holder
- University Hospital Tübingen
- Copyright year
- 2025
- Is accessible for free
- Yes