In the past two decades the estimation of the intrinsic dimensionality of a dataset has gained considerable importance, since it is a relevant information for several real life applications. Unfortunately, although a great deal of research effort has been devoted to the development of effective intrinsic dimensionality estimators, the problem is still open. For this reason, in this paper we propose a novel robust intrinsic dimensionality estimator that exploits the information conveyed by the normalized nearest neighbor distances, through a technique based on rank-order statistics that limits common underestimation issues related to the edge effect. Experiments performed on both synthetic and real datasets highlight the robustness and the effectiveness of the proposed algorithm when compared to state-of-the-art methodologies.
A novel intrinsic dimensionality estimator based on rank-order statistics / S. Bassis, A. Rozza, C. Ceruti, G. Lombardi, E. Casiraghi, P. Campadelli - In: Clustering high-dimensional data : first International workshop, CHDD 2012, Naples, Italy, May 15, 2012 : revised selected papers / [a cura di] F. Masulli, A. Petrosino, S. Rovetta. - Prima edizione. - Berlin : Springer, 2015. - ISBN 9783662485774. - pp. 102-117 (( Intervento presentato al 1. convegno International workshop on Clustering high-dimensional data, CHDD tenutosi a Naples (Italy) nel 2012.
A novel intrinsic dimensionality estimator based on rank-order statistics
S. Bassis;C. Ceruti;E. Casiraghi;P. Campadelli
2015
Abstract
In the past two decades the estimation of the intrinsic dimensionality of a dataset has gained considerable importance, since it is a relevant information for several real life applications. Unfortunately, although a great deal of research effort has been devoted to the development of effective intrinsic dimensionality estimators, the problem is still open. For this reason, in this paper we propose a novel robust intrinsic dimensionality estimator that exploits the information conveyed by the normalized nearest neighbor distances, through a technique based on rank-order statistics that limits common underestimation issues related to the edge effect. Experiments performed on both synthetic and real datasets highlight the robustness and the effectiveness of the proposed algorithm when compared to state-of-the-art methodologies.File | Dimensione | Formato | |
---|---|---|---|
chdd13id.pdf
accesso riservato
Tipologia:
Pre-print (manoscritto inviato all'editore)
Dimensione
260.19 kB
Formato
Adobe PDF
|
260.19 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
chp_10.1007_978-3-662-48577-4_7.pdf
accesso riservato
Tipologia:
Publisher's version/PDF
Dimensione
360.82 kB
Formato
Adobe PDF
|
360.82 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.