Simple Summary Radiomics aims to extract high-dimensional features from clinical images and associate them to clinical outcomes. These associations may be further investigated with machine learning models; however, guidelines on the most suitable method to support clinical decisions are still missing. To improve the reliability and the accuracy of radiomic features in the prediction of a binary variable in a lung cancer setting, we compared several machine learning classifiers and feature selection methods on simulated data. These account for important characteristics that may vary in real clinical datasets: sample size, outcome balancing and association strength between radiomic features and outcome variables. We were able to suggest the most suitable classifiers for each studied case and to evaluate the impact of method choices. Our work highlights the importance of these choices in radiomic analyses and provides guidelines on how to select the best models for the data at hand. Radiomics uses high-dimensional sets of imaging features to predict biological characteristics of tumors and clinical outcomes. The choice of the algorithm used to analyze radiomic features and perform predictions has a high impact on the results, thus the identification of adequate machine learning methods for radiomic applications is crucial. In this study we aim to identify suitable approaches of analysis for radiomic-based binary predictions, according to sample size, outcome balancing and the features-outcome association strength. Simulated data were obtained reproducing the correlation structure among 168 radiomic features extracted from Computed Tomography images of 270 Non-Small-Cell Lung Cancer (NSCLC) patients and the associated to lymph node status. Performances of six classifiers combined with six feature selection (FS) methods were assessed on the simulated data using AUC (Area Under the Receiver Operating Characteristics Curves), sensitivity, and specificity. For all the FS methods and regardless of the association strength, the tree-based classifiers Random Forest and Extreme Gradient Boosting obtained good performances (AUC >= 0.73), showing the best trade-off between sensitivity and specificity. On small samples, performances were generally lower than in large-medium samples and with larger variations. FS methods generally did not improve performances. Thus, in radiomic studies, we suggest evaluating the choice of FS and classifiers, considering specific sample size, balancing, and association strength.

The Challenge of Choosing the Best Classification Method in Radiomic Analyses: Recommendations and Applications to Lung Cancer CT Images / F. Corso, G. Tini, G. Lo Presti, N. Garau, S.P. De Angelis, F. Bellerba, L. Rinaldi, F. Botta, S. Rizzo, D. Origgi, C. Paganelli, M. Cremonesi, C. Rampinelli, M. Bellomi, L. Mazzarella, P.G. Pelicci, S. Gandini, S. Raimondi. - In: CANCERS. - ISSN 2072-6694. - 13:12(2021 Jun 21), pp. 3088.1-3088.17. [10.3390/cancers13123088]

The Challenge of Choosing the Best Classification Method in Radiomic Analyses: Recommendations and Applications to Lung Cancer CT Images

F. Bellerba;F. Botta;S. Rizzo;C. Paganelli;C. Rampinelli;M. Bellomi;L. Mazzarella;P.G. Pelicci;S. Raimondi
Ultimo
2021

Abstract

Simple Summary Radiomics aims to extract high-dimensional features from clinical images and associate them to clinical outcomes. These associations may be further investigated with machine learning models; however, guidelines on the most suitable method to support clinical decisions are still missing. To improve the reliability and the accuracy of radiomic features in the prediction of a binary variable in a lung cancer setting, we compared several machine learning classifiers and feature selection methods on simulated data. These account for important characteristics that may vary in real clinical datasets: sample size, outcome balancing and association strength between radiomic features and outcome variables. We were able to suggest the most suitable classifiers for each studied case and to evaluate the impact of method choices. Our work highlights the importance of these choices in radiomic analyses and provides guidelines on how to select the best models for the data at hand. Radiomics uses high-dimensional sets of imaging features to predict biological characteristics of tumors and clinical outcomes. The choice of the algorithm used to analyze radiomic features and perform predictions has a high impact on the results, thus the identification of adequate machine learning methods for radiomic applications is crucial. In this study we aim to identify suitable approaches of analysis for radiomic-based binary predictions, according to sample size, outcome balancing and the features-outcome association strength. Simulated data were obtained reproducing the correlation structure among 168 radiomic features extracted from Computed Tomography images of 270 Non-Small-Cell Lung Cancer (NSCLC) patients and the associated to lymph node status. Performances of six classifiers combined with six feature selection (FS) methods were assessed on the simulated data using AUC (Area Under the Receiver Operating Characteristics Curves), sensitivity, and specificity. For all the FS methods and regardless of the association strength, the tree-based classifiers Random Forest and Extreme Gradient Boosting obtained good performances (AUC >= 0.73), showing the best trade-off between sensitivity and specificity. On small samples, performances were generally lower than in large-medium samples and with larger variations. FS methods generally did not improve performances. Thus, in radiomic studies, we suggest evaluating the choice of FS and classifiers, considering specific sample size, balancing, and association strength.
CT images; balancing; classification; feature selection; lung cancer; machine learning; radiomics; sample size; signal; simulation
Settore MED/04 - Patologia Generale
21-giu-2021
Article (author)
File in questo prodotto:
File Dimensione Formato  
cancers-13-03088-v2.pdf

accesso aperto

Tipologia: Publisher's version/PDF
Dimensione 2.03 MB
Formato Adobe PDF
2.03 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/930355
Citazioni
  • ???jsp.display-item.citation.pmc??? 3
  • Scopus 8
  • ???jsp.display-item.citation.isi??? 8
social impact