This paper reviews the global evolution of synthetic data (SD) generation in the field of genomic cancer medicine, with an analysis of research trends from the past decade. The use of artificial intelligence, particularly machine learning and deep learning techniques has transformed this area, providing solutions to overcome the limited availability of real clinical data. Through a bibliometric analysis of a wide sample of scientific articles from SCOPUS, this study highlights the adoption of SD generation techniques in oncological applications, focusing on major methodologies and challenges. Key application areas, such as multi-omics integration (genomics, transcriptomics, and proteomics) and tumor genomic heterogeneity, emerge as fields of growing interest. Despite noise management and performance optimization challenges, advanced machine learning techniques prove essential for generating high-quality SD that reflects biological complexity. The study also identifies key open challenges, such as simulation accuracy and noise control, offering insights into future applications of SD in personalized medicine and cancer therapy.

Synthetic data generation in genomic cancer medicine: a review of global research trends in the last ten years / V. De Nicoló, M. Frasca, A. Graziosi, G. Gazzaniga, D.L. Torre, A. Pani. - In: DISCOVER ARTIFICIAL INTELLIGENCE. - ISSN 2731-0809. - 5:1(2025 Jul 15), pp. 1-31. [10.1007/s44163-025-00384-9]

Synthetic data generation in genomic cancer medicine: a review of global research trends in the last ten years

M. Frasca
Secondo
;
A. Graziosi;G. Gazzaniga;D.L. Torre
Penultimo
;
A. Pani
Ultimo
2025

Abstract

This paper reviews the global evolution of synthetic data (SD) generation in the field of genomic cancer medicine, with an analysis of research trends from the past decade. The use of artificial intelligence, particularly machine learning and deep learning techniques has transformed this area, providing solutions to overcome the limited availability of real clinical data. Through a bibliometric analysis of a wide sample of scientific articles from SCOPUS, this study highlights the adoption of SD generation techniques in oncological applications, focusing on major methodologies and challenges. Key application areas, such as multi-omics integration (genomics, transcriptomics, and proteomics) and tumor genomic heterogeneity, emerge as fields of growing interest. Despite noise management and performance optimization challenges, advanced machine learning techniques prove essential for generating high-quality SD that reflects biological complexity. The study also identifies key open challenges, such as simulation accuracy and noise control, offering insights into future applications of SD in personalized medicine and cancer therapy.
Synthetic data, Genomic medicine, Cancer research, Data privacy, Machine learning;
Settore BIOS-08/A - Biologia molecolare
Settore BIOS-11/A - Farmacologia
Settore STAT-04/A - Metodi matematici dell'economia e delle scienze attuariali e finanziarie
15-lug-2025
Article (author)
File in questo prodotto:
File Dimensione Formato  
unpaywall-bitstream-314595643.pdf

accesso aperto

Tipologia: Publisher's version/PDF
Licenza: Creative commons
Dimensione 2.73 MB
Formato Adobe PDF
2.73 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1175994
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact