IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

Background: In high-dimensional data (HDD) settings, the number of variables associated with each observation is very large. Prominent examples of HDD in biomedical research include omics data with a large number of variables such as many measurements across the genome, proteome, or metabolome, as well as electronic health records data that have large numbers of variables recorded for each patient. The statistical analysis of such data requires knowledge and experience, sometimes of complex methods adapted to the respective research questions. Methods: Advances in statistical methodology and machine learning methods offer new opportunities for innovative analyses of HDD, but at the same time require a deeper understanding of some fundamental statistical concepts. Topic group TG9 "High-dimensional data" of the STRATOS (STRengthening Analytical Thinking for Observational Studies) initiative provides guidance for the analysis of observational studies, addressing particular statistical challenges and opportunities for the analysis of studies involving HDD. In this overview, we discuss key aspects of HDD analysis to provide a gentle introduction for non-statisticians and for classically trained statisticians with little experience specific to HDD. Results: The paper is organized with respect to subtopics that are most relevant for the analysis of HDD, in particular initial data analysis, exploratory data analysis, multiple testing, and prediction. For each subtopic, main analytical goals in HDD settings are outlined. For each of these goals, basic explanations for some commonly used analysis methods are provided. Situations are identified where traditional statistical methods cannot, or should not, be used in the HDD setting, or where adequate analytic tools are still lacking. Many key references are provided. Conclusions: This review aims to provide a solid statistical foundation for researchers, including statisticians and non-statisticians, who are new to research with HDD or simply want to better evaluate and understand the results of HDD analyses.

Statistical analysis of high-dimensional biomedical data: a gentle introduction to analytical goals, common approaches and challenges / J. Rahnenführer, R. De Bin, A. Benner, F. Ambrogi, L. Lusa, A. Boulesteix, E. Migliavacca, H. Binder, S. Michiels, W. Sauerbrei, L. Mcshane. - In: BMC MEDICINE. - ISSN 1741-7015. - 21:1(2023 May 15), pp. 182.1-182.54. [10.1186/s12916-023-02858-y]

Statistical analysis of high-dimensional biomedical data: a gentle introduction to analytical goals, common approaches and challenges

Rahnenführer, Jörg;De Bin, Riccardo;Benner, Axel;F. Ambrogi^{Membro del Collaboration Group};Lusa, Lara;Boulesteix, Anne-Laure;Migliavacca, Eugenia;Binder, Harald;Michiels, Stefan;Sauerbrei, Willi;McShane, Lisa

2023

Abstract

Background: In high-dimensional data (HDD) settings, the number of variables associated with each observation is very large. Prominent examples of HDD in biomedical research include omics data with a large number of variables such as many measurements across the genome, proteome, or metabolome, as well as electronic health records data that have large numbers of variables recorded for each patient. The statistical analysis of such data requires knowledge and experience, sometimes of complex methods adapted to the respective research questions. Methods: Advances in statistical methodology and machine learning methods offer new opportunities for innovative analyses of HDD, but at the same time require a deeper understanding of some fundamental statistical concepts. Topic group TG9 "High-dimensional data" of the STRATOS (STRengthening Analytical Thinking for Observational Studies) initiative provides guidance for the analysis of observational studies, addressing particular statistical challenges and opportunities for the analysis of studies involving HDD. In this overview, we discuss key aspects of HDD analysis to provide a gentle introduction for non-statisticians and for classically trained statisticians with little experience specific to HDD. Results: The paper is organized with respect to subtopics that are most relevant for the analysis of HDD, in particular initial data analysis, exploratory data analysis, multiple testing, and prediction. For each subtopic, main analytical goals in HDD settings are outlined. For each of these goals, basic explanations for some commonly used analysis methods are provided. Situations are identified where traditional statistical methods cannot, or should not, be used in the HDD setting, or where adequate analytic tools are still lacking. Many key references are provided. Conclusions: This review aims to provide a solid statistical foundation for researchers, including statisticians and non-statisticians, who are new to research with HDD or simply want to better evaluate and understand the results of HDD analyses.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
				Analytical goals; Clustering; Exploratory data analysis; High-dimensional data; Initial data analysis; Multiple testing; Omics data; Prediction; STRATOS initiative
			
	Settori scientifico-disciplinari dell'articolo (sola visualizzazione)
	
				Settore MED/01 - Statistica Medica
			
	Titolo del progetto
	
	Titolo Progetto
	
									Innovative statistical methods in biomedical research on biomarkers: from their identification to their use in clinical practice
								
	Nome finanziatore
	
										MINISTERO DELL'ISTRUZIONE E DEL MERITO
									
	N. Contratto
	
									20178S4EK9_004
								
	Data di pubblicazione
	
				15-mag-2023
			
	Rivista in ANCE
	
				BMC MEDICINE
			
	DOI
	
				https://dx.doi.org/10.1186/s12916-023-02858-y
			
	URL
	
				https://bmcmedicine.biomedcentral.com/articles/10.1186/s12916-023-02858-y
			
	Tipologia
	
				Article (author)
			
	Appare nelle tipologie:
	
				01 - Articolo su periodico

File in questo prodotto:

File	Dimensione	Formato
12916_2023_Article_2858.pdf accesso aperto Descrizione: Article Tipologia: Publisher's version/PDF Dimensione 5.48 MB Formato Adobe PDF Visualizza/Apri	5.48 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/969941

Citazioni

0

26

22

social impact