Get the ATLAS dataset
The ATLAS dataset
The ATLAS dataset was largely acquired at the University Hospital, Dijon 21 000, France and consists of real-world clinical acquisitions of HCC. The ATLAS dataset was anonymized and processed in accordance with the rules established by the Ethical Committee of the University Hospital of Dijon. All administrative information included in the metadata has been removed, making it untraceable. Thus, in accordance with the French law it was not necessary to go through the process of obtaining an ethical approval number.
The CC BY-NC-SA 4.0 license lets participants remix, adapt, and build upon the ATLAS dataset non-commercially, as long as they credit the ATLAS dataset and license their new creations under identical terms.
All the details regarding the dataset are provided in the following paper:
F. Quinton, R. Popoff, B. Presles, S. Leclerc, F. Meriaudeau, G. Nodari, O. Lopez, J. Pellegrinelli, O. Chevallier, D. Ginhac, J.-M. Vrigneaud , J.-L. Alberini. "A Tumor and Liver Automatic Segmentation (ATLAS) dataset: a hepatocellular carcinoma dataset for automatic segmentation of liver and liver tumours on contrast-enhanced magnetic resonance imaging". Data 2023, 8(5), 79. https://doi.org/10.3390/data8050079
Dataset organization
The overall dataset is divided in two sets:
- A training set of 60 patients acquired from 2012 to 2020 containing 60 images along with the 60 corresponding labels for liver and tumor in nifty format.
- A test set of 30 patients acquired from 2020 to 2023 containing 30 images along with the 30 corresponding labels for liver and tumor under nifty format.
Patients' information for the train set including acquisition year, image resolution, contrast phase, MRI sequence, and MRI manufacturer are provided in the file "patient_info_train.json". A similar file for the test set "patient_info_test.json" will be available during the submission process.
The Python code which will be used for metric calculation is also provided in the folder "metric_calculation".
Since the dataset was acquired over 11 years the quality of patient care and quality of acquisitions has gradually improved. Thus, there is a domain shift between the older and more recent acquisitions, most recent acquisitions being more likely to resemble future acquisitions. For this reason, the test set consists of the most recent patients of the dataset. The test dataset will be released when the website closes at the end of 2025.
Data source
The CE-MRI (gadolinium-based contrast agents) images of the ATLAS dataset were mainly acquired on five Siemens 1.5T and 3T MRI machines. Few acquisitions were made on General Electric (GE) 1.5T MRI machines.
Each acquisition is T1-weighted in the transversal axis with an ultrafast gradient echo sequence. On Siemens equipment, Volumetric Interpolated Breath-hold Examination (VIBE) sequence was used or a derived VIBE sequence, such as VIBE TWIST or VIBE CAIPIRINHA. On GE equipment, Liver Acquisition with Volume Acquisition (LAVA) or LAVA FLEX sequences were used. Fat saturation (FATSAT) is applied for all acquisitions.
Each exam results in three to five CE-MR images, one image is selected to be added to the dataset. The selected image is usually one of the three post-contrast injection phases (arterial, portal or delayed) that is also used by the experts to delineate the liver and tumor(s). In some rare cases, the selected image is done without agent contrast injection.
Finally, a CE-MRI of the ATLAS dataset consists of a 3D image of 44 to 136 transversal slices of the thorax and abdomen covering the entire liver and tumor. Each slice has a pixel spacing between 0.68 x 0.68 mm² and 1.41 x 1.41mm², and the slice thickness is between 2mm and 4mm.
Using the selected CE-MRI, the liver and tumor(s) contours were manually delineated by an experienced MRI radiologist on transverse slices to produce the labels. No pre-processing was applied.