Lungs Dataset
An anonymized lung cancer imaging dataset containing CT (and PET) DICOM scans alongside clinical patient metadata.
Dataset Summary
| Attribute |
Value |
| Total patients (per patient sheet) |
457 |
| Total patients (with imaging folders) |
452 |
| Male / Female |
285 / 171 |
| Age range |
38 – 104 years (mean 68.5) |
| PET-positive cases |
103 confirmed, 5 unclear |
Directory Structure
lungs_dataset/
├── PatientsTableAnonymized.xlsx # Clinical metadata summary
└── img/
├── <patient_id>/ # Individual patient folders (numeric IDs)
│ └── S0001/
│ └── IMG-XXXX-XXXXX.dcm
├── LUNG_DATASET/ # Patients with paired CT + PET scans
│ └── <patient_id>/
│ ├── CT_1_25/
│ │ └── S0001/
│ │ └── IMG-XXXX-XXXXX.dcm
│ └── PET/
│ └── S0001/
│ └── IMG-XXXX-XXXXX.dcm
└── instead of venous and 5/ # Additional patient cohort (32 patients)
└── <patient_id>/
└── S0001/
└── IMG-XXXX-XXXXX.dcm
- Patient folders: 320 top-level + 100 in
LUNG_DATASET + 32 in instead of venous and 5 = 452 total
- 328 of the 457 patients in the sheet have a linked folder ID; 129 have no folder ID assigned
- DICOM slices per scan: 0 – 1,995 (average ~204)
- All images are in standard DICOM format (
.dcm)
Cancer Types
| Type |
Count |
| Adenocarcinoma (all subtypes) |
345 |
| Squamous cell carcinoma |
40 |
| Non-keratinizing squamous cell carcinoma |
29 |
| NSCLC-NOS |
16 |
| Non-NSCLC / unclear |
10 |
| Large cell carcinoma |
1 |
| Adenosquamous carcinoma |
1 |
Biomarkers
| Biomarker |
Count |
| TTF1 |
308 |
| CK7 |
242 |
| NAPSIN |
113 |
| NAPSIN A |
89 |
| P63 |
83 |
| CK5 |
48 |
| P40 |
44 |
| CK AE1/AE3 |
30 |
| KI67 |
21 |
| Not given |
18 |
Notes
- All patient identifiers have been anonymized.
- The
LUNG_DATASET subfolder contains paired PET/CT scans for a subset of patients, useful for multimodal analysis.
- The
instead of venous and 5 subfolder contains 32 additional patients with CT scans only.
- Some patients may have zero-slice sessions (empty folders).