Recently, a great deal of research work has been devoted to the development of algorithms to estimate the intrinsic dimensionality (id) of a given dataset, that is the minimum number of parameters needed to represent the data without information loss. id estimation is important for the following reasons: the capacity and the generalization capability of discriminant methods depend on it; id is a necessary information for any dimensionality reduction technique; in neural network design the number of hidden units in the encoding middle layer should be chosen according to the id of data; the id value is strongly related to the model order in a time series, that is crucial to obtain reliable time series predictions.Although many estimation techniques have been proposed in the literature, most of them fail on noisy data, or compute underestimated values when the id is sufficiently high. In this paper, after reviewing some of the most important id estimators related to our work, we provide a theoretical motivation of the bias that causes the underestimation effect, and we present two id estimators based on the statistical properties of manifold neighborhoods, which have been developed in order to reduce this effect. We exhaustively evaluate the proposed techniques on synthetic and real datasets, by employing an objective evaluation measure to compare their performance with those achieved by state of the art algorithms; the results show that the proposed methods are promising, and produce reliable estimates also in the difficult case of datasets drawn from non-linearly embedded manifolds, characterized by high id.

Novel high intrinsic dimensionality estimators / A. Rozza, G. Lombardi, C. Ceruti, E. Casiraghi, P. Campadelli. - In: MACHINE LEARNING. - ISSN 0885-6125. - 89:1-2(2012 Oct), pp. 37-65.

Novel high intrinsic dimensionality estimators

C. Ceruti;E. Casiraghi;P. Campadelli
2012

Abstract

Recently, a great deal of research work has been devoted to the development of algorithms to estimate the intrinsic dimensionality (id) of a given dataset, that is the minimum number of parameters needed to represent the data without information loss. id estimation is important for the following reasons: the capacity and the generalization capability of discriminant methods depend on it; id is a necessary information for any dimensionality reduction technique; in neural network design the number of hidden units in the encoding middle layer should be chosen according to the id of data; the id value is strongly related to the model order in a time series, that is crucial to obtain reliable time series predictions.Although many estimation techniques have been proposed in the literature, most of them fail on noisy data, or compute underestimated values when the id is sufficiently high. In this paper, after reviewing some of the most important id estimators related to our work, we provide a theoretical motivation of the bias that causes the underestimation effect, and we present two id estimators based on the statistical properties of manifold neighborhoods, which have been developed in order to reduce this effect. We exhaustively evaluate the proposed techniques on synthetic and real datasets, by employing an objective evaluation measure to compare their performance with those achieved by state of the art algorithms; the results show that the proposed methods are promising, and produce reliable estimates also in the difficult case of datasets drawn from non-linearly embedded manifolds, characterized by high id.
dimensionality reduction; Intrinsic dimensionality estimation; Manifold learning
Settore INF/01 - Informatica
Settore MAT/06 - Probabilita' e Statistica Matematica
ott-2012
Article (author)
File in questo prodotto:
File Dimensione Formato  
Mlj_STAMPATO.pdf

accesso riservato

Tipologia: Publisher's version/PDF
Dimensione 1.18 MB
Formato Adobe PDF
1.18 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/177956
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 65
  • ???jsp.display-item.citation.isi??? 51
social impact