Diagnostic imaging offers important support in the care of cancer patients. Imaging data provide complementary information to clinical data and can be morphological or functional in nature. Radiomic investigations aim to extract data from medical images that may be invisible to the human eye, in order to predict a clinical endpoint. The image region containing potential- ly useful information is segmented and its features are translated, through mathematical formulas that consider the position and intensity of individual voxels, into numerical features. Once obtained, features can be used to build or train predictive models. The numerous study parameters that can influen- ce the value of features must be investigated to improve the predictive power of the model. The main objective of this study is to evaluate the possible role of radio- mics, performed on pre-treatment 18F-FDG PET images, in the classification of non-small cell lung cancer (NSCLC) pathology into two categories: adeno- carcinoma (ADC) and squamous cell carcinoma (SPINO). For this purpose, the clinical data of 280 patients affected by NSCLC of known histology who underwent PET/CT from August 2014 to November 2023 were retrospecti- vely analyzed. All images were acquired on the same PET/CT scanner. Five different semi-automatic methods were used for lesion segmentation. For each segmentation method, the LifeX v7.16 software allowed to acquire five feature sets, varying the SUV resampling method. The selection of the fea- tures used for model training was performed with the minimum Redundancy Maximum Relevance (mRMR) algorithm. Six different machine learning mo- dels were trained and compared: Logistic Regression (LR), Support Vector Classifier (SVC), Random Forest (RF), Extreme Gradient Boosting (XGB), Gaussian Process (GP) and Multilayer Perceptron Classifier (MLP). The mo- dels were first trained on the whole sample, then the effect of the variance of lesion volume on the predictive power of the model was investigated by dividing the dataset into homogeneous subgroups and repeating the training on each of them and on multiple subgroups. Some of the collected clinical data were used to train a classification model, while a model combining cli- nical and radiomic features was trained at a later stage, to verify whether the introduction of radiomics enhanced the model’s predictive power. The 10-fold Cross-Validation method is used for internal validation of all models. The predictive capabilities of all trained models were compared using the Receiver Operating Characteristic - Area Under Curve (ROC AUC) metric. The segmentation methods used were evaluated according to three crite- ria: time required, experience required and resulting predictive power. The segmentation via threshold at 40% of the maximum SUV was the best among those examined when all three of the criteria are considered. Using only ra- diomic data, morphologic features such as sphericity and volume and second order features such as uniformity of voxel intensities are often exploited by the best models for their high discriminant power. The highest AUC value, equal to 0.86, was achieved with the Nestle segmentation method, extracting the features while keeping the number of bins constant, training an XGB algorithm on a highly homogeneous data sample by volume (between 3 and 9 cm3). By repeating the 10-fold cross-validation process 50 times and using only clinical data, the model trained on a dataset of lesions from 3 to 70 cm3, achieves a best AUC of 0.72, significantly lower than the one obtained by the model using only radiomic features on the same data subset (0.76 AUC, p < 0.05). By combining clinical and radiomic features the predictive power of the model does not increase (0.76 AUC). From this study it can be concluded that, for the purpose of classifying lesions in ADC or SPINO, it is advisable to use a threshold segmentation at 40% of the maximum SUV, keeping the number of bins fixed during featu- re extraction. It is advantageous to train different models on homogeneous subgroups by volume, if the sample size allows it, in order to increase the predictive power, compared to using a single model trained on a more he- terogeneous sample by volume. Features extracted from small lesions are statistically less reliable as they are calculated on fewer voxels. With only the information provided by clinical features, a useful classification model can be trained (AUC > 0.70), although with a lower predictive power. Even the combined model performance is not significantly better than the one of the radiomic model Validation of the results obtained on an external dataset should be the goal of a future study.

Studio di radiomica per il trattamento del tumore del polmone non a piccole cellule / D. Ghittori. - (2024 Nov 14).

Studio di radiomica per il trattamento del tumore del polmone non a piccole cellule

D. Ghittori
2024

Abstract

Diagnostic imaging offers important support in the care of cancer patients. Imaging data provide complementary information to clinical data and can be morphological or functional in nature. Radiomic investigations aim to extract data from medical images that may be invisible to the human eye, in order to predict a clinical endpoint. The image region containing potential- ly useful information is segmented and its features are translated, through mathematical formulas that consider the position and intensity of individual voxels, into numerical features. Once obtained, features can be used to build or train predictive models. The numerous study parameters that can influen- ce the value of features must be investigated to improve the predictive power of the model. The main objective of this study is to evaluate the possible role of radio- mics, performed on pre-treatment 18F-FDG PET images, in the classification of non-small cell lung cancer (NSCLC) pathology into two categories: adeno- carcinoma (ADC) and squamous cell carcinoma (SPINO). For this purpose, the clinical data of 280 patients affected by NSCLC of known histology who underwent PET/CT from August 2014 to November 2023 were retrospecti- vely analyzed. All images were acquired on the same PET/CT scanner. Five different semi-automatic methods were used for lesion segmentation. For each segmentation method, the LifeX v7.16 software allowed to acquire five feature sets, varying the SUV resampling method. The selection of the fea- tures used for model training was performed with the minimum Redundancy Maximum Relevance (mRMR) algorithm. Six different machine learning mo- dels were trained and compared: Logistic Regression (LR), Support Vector Classifier (SVC), Random Forest (RF), Extreme Gradient Boosting (XGB), Gaussian Process (GP) and Multilayer Perceptron Classifier (MLP). The mo- dels were first trained on the whole sample, then the effect of the variance of lesion volume on the predictive power of the model was investigated by dividing the dataset into homogeneous subgroups and repeating the training on each of them and on multiple subgroups. Some of the collected clinical data were used to train a classification model, while a model combining cli- nical and radiomic features was trained at a later stage, to verify whether the introduction of radiomics enhanced the model’s predictive power. The 10-fold Cross-Validation method is used for internal validation of all models. The predictive capabilities of all trained models were compared using the Receiver Operating Characteristic - Area Under Curve (ROC AUC) metric. The segmentation methods used were evaluated according to three crite- ria: time required, experience required and resulting predictive power. The segmentation via threshold at 40% of the maximum SUV was the best among those examined when all three of the criteria are considered. Using only ra- diomic data, morphologic features such as sphericity and volume and second order features such as uniformity of voxel intensities are often exploited by the best models for their high discriminant power. The highest AUC value, equal to 0.86, was achieved with the Nestle segmentation method, extracting the features while keeping the number of bins constant, training an XGB algorithm on a highly homogeneous data sample by volume (between 3 and 9 cm3). By repeating the 10-fold cross-validation process 50 times and using only clinical data, the model trained on a dataset of lesions from 3 to 70 cm3, achieves a best AUC of 0.72, significantly lower than the one obtained by the model using only radiomic features on the same data subset (0.76 AUC, p < 0.05). By combining clinical and radiomic features the predictive power of the model does not increase (0.76 AUC). From this study it can be concluded that, for the purpose of classifying lesions in ADC or SPINO, it is advisable to use a threshold segmentation at 40% of the maximum SUV, keeping the number of bins fixed during featu- re extraction. It is advantageous to train different models on homogeneous subgroups by volume, if the sample size allows it, in order to increase the predictive power, compared to using a single model trained on a more he- terogeneous sample by volume. Features extracted from small lesions are statistically less reliable as they are calculated on fewer voxels. With only the information provided by clinical features, a useful classification model can be trained (AUC > 0.70), although with a lower predictive power. Even the combined model performance is not significantly better than the one of the radiomic model Validation of the results obtained on an external dataset should be the goal of a future study.
LENARDI, CRISTINA
14-nov-2024
Radiomics; PT; NSCLC
Tesi di specializzazione
Studio di radiomica per il trattamento del tumore del polmone non a piccole cellule / D. Ghittori. - (2024 Nov 14).
File in questo prodotto:
File Dimensione Formato  
spec_unimi_S65901.pdf

accesso aperto

Descrizione: tesi di specializzazione
Tipologia: Altro
Dimensione 2.46 MB
Formato Adobe PDF
2.46 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1115530
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact