Minimum Neighbor Distance Estimators of Intrinsic Dimension

Lombardi, G.; Rozza, A.; Ceruti, C.; Casiraghi, E.; Campadelli, P.

doi:10.1007/978-3-642-23783-6_24

Most of the machine learning techniques suffer the “curse of dimensionality” effect when applied to high dimensional data. To face this limitation, a common preprocessing step consists in employing a dimensionality reduction technique. In literature, a great deal of research work has been devoted to the development of algorithms performing this task. Often, these techniques require as parameter the number of dimensions to be retained; to this aim, they need to estimate the “intrinsic dimensionality” of the given dataset, which refers to the minimum num- ber of degrees of freedom needed to capture all the information carried by the data. Although many estimation techniques have been proposed, most of them fail in case of noisy data or when the intrinsic dimensionality is too high. In this paper we present a family of estimators based on the probability density function of the normalized nearest neighbor distance. We evaluate the proposed techniques on both synthetic and real datasets comparing their performances with those obtained by state of the art algorithms; the achieved results prove that the proposed methods are promising.

Minimum Neighbor Distance Estimators of Intrinsic Dimension / G. Lombardi, A. Rozza, C. Ceruti, E. Casiraghi, P. Campadelli - In: Machine learning and knowledge discovery in databases : European conference, ECML PKDD 2011, Athens, Greece, September 5-9, 2011 : proceedings. 2 / [a cura di] D. Gunopulos, T. Hofmann, D. Malerba, M. Vazirgiannis. - Heidelberg : Springer, 2011 Jun. - ISBN 9783642237829. - pp. 374-389 (( convegno European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases tenutosi a Athens nel 2011.

Minimum Neighbor Distance Estimators of Intrinsic Dimension

G. Lombardi;A. Rozza;C. Ceruti;E. Casiraghi;P. Campadelli

2011

Abstract

Most of the machine learning techniques suffer the “curse of dimensionality” effect when applied to high dimensional data. To face this limitation, a common preprocessing step consists in employing a dimensionality reduction technique. In literature, a great deal of research work has been devoted to the development of algorithms performing this task. Often, these techniques require as parameter the number of dimensions to be retained; to this aim, they need to estimate the “intrinsic dimensionality” of the given dataset, which refers to the minimum num- ber of degrees of freedom needed to capture all the information carried by the data. Although many estimation techniques have been proposed, most of them fail in case of noisy data or when the intrinsic dimensionality is too high. In this paper we present a family of estimators based on the probability density function of the normalized nearest neighbor distance. We evaluate the proposed techniques on both synthetic and real datasets comparing their performances with those obtained by state of the art algorithms; the achieved results prove that the proposed methods are promising.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
				dimensionality reduction; Intrinsic dimensionality estimation; manifold learning
			
	Settori scientifico-disciplinari del contributo (sola visualizzazione)
	
				Settore INF/01 - Informatica
			
	Data di pubblicazione
	
				giu-2011
			
	DOI
	
				https://dx.doi.org/10.1007/978-3-642-23783-6_24
			
	Tipologia
	
				Book Part (author)
			
	Appare nelle tipologie:
	
				03 - Contributo in volume

File in questo prodotto:

File	Dimensione	Formato
lombardi_20.pdf accesso riservato Tipologia: Publisher's version/PDF Dimensione 234.06 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	234.06 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/163797

Citazioni

ND

37

33

40

IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca