Multi-omics data are of paramount importance in biomedicine, providing a comprehensive view of processes underlying disease. They are characterized by high dimensions and are hence affected by the so-called ”curse of dimensionality”, ultimately leading to unreliable estimates. This calls for effective Dimensionality Reduction (DR) techniques to embed the high-dimensional data into a lower-dimensional space. Though effective DR methods have been proposed so far, given the high dimension of the initial dataset unsupervised Feature Selection (FS) techniques are often needed prior to their application. Unfortunately, both unsupervised FS and DR techniques require the dimension of the lower dimensional space to be provided. This is a crucial choice, for which a well-accepted solution has not been defined yet. The Intrinsic Dimension (ID) of a dataset is defined as the minimum number of dimensions that allow representing the data without information loss. Therefore, the ID of a dataset is related to its informativeness and complexity. In this paper, after proposing a blocking ID estimation to leverage state-of-the-art (SOTA) ID estimate methods we present our DR pipeline, whose subsequent FS and DR steps are guided by the ID estimate.
Intrinsic-Dimension Analysis for Guiding Dimensionality Reduction in Multi-Omics Data / V. Guarino, J. Gliozzo, F. Clarelli, B. Pignolet, K. Misra, E. Mascia, G. Antonino, S. Santoro, L. Ferré, M. Cannizzaro, M. Sorosina, R. Liblau, M. Filippi, E. Mosca, F. Esposito, G. Valentini, E. Casiraghi - In: Proceedings of the 16th International Joint Conference on Biomedical Engineering Systems and Technologies. 3: Bioinformatics / [a cura di] H. Ali, N. Deng, A. Fred, H. Gamboa. - [s.l] : Scitepress, 2023. - ISBN 978-989-758-631-6. - pp. 243-251 (( Intervento presentato al 16. convegno International Joint Conference on Biomedical Engineering Systems and Technologies tenutosi a Lisbona nel 2023 [10.5220/0011775200003414].
Intrinsic-Dimension Analysis for Guiding Dimensionality Reduction in Multi-Omics Data
J. GliozzoSecondo
;F. Esposito;G. ValentiniPenultimo
;E. CasiraghiUltimo
2023
Abstract
Multi-omics data are of paramount importance in biomedicine, providing a comprehensive view of processes underlying disease. They are characterized by high dimensions and are hence affected by the so-called ”curse of dimensionality”, ultimately leading to unreliable estimates. This calls for effective Dimensionality Reduction (DR) techniques to embed the high-dimensional data into a lower-dimensional space. Though effective DR methods have been proposed so far, given the high dimension of the initial dataset unsupervised Feature Selection (FS) techniques are often needed prior to their application. Unfortunately, both unsupervised FS and DR techniques require the dimension of the lower dimensional space to be provided. This is a crucial choice, for which a well-accepted solution has not been defined yet. The Intrinsic Dimension (ID) of a dataset is defined as the minimum number of dimensions that allow representing the data without information loss. Therefore, the ID of a dataset is related to its informativeness and complexity. In this paper, after proposing a blocking ID estimation to leverage state-of-the-art (SOTA) ID estimate methods we present our DR pipeline, whose subsequent FS and DR steps are guided by the ID estimate.File | Dimensione | Formato | |
---|---|---|---|
ID_DR_BioInf2023.pdf
accesso riservato
Tipologia:
Publisher's version/PDF
Dimensione
432.46 kB
Formato
Adobe PDF
|
432.46 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.