It is widely believed that understanding data structure is a crucial ingredient to push forward our comprehension on how (and why) modern machine learning works. Still, most of the theoretical results we have are obtained under very simplifying assumptions on the structure of the training data. In this Thesis, I review some novel results on the problem of characterizing the geometric structure of datasets and the consequences that this structure has on learning algorithms. I also provide pedagogical introductions to manifold learning, random geometric graphs theory and supervised binary classification. I focus on three different aspects of the problem. First, I spend some time reviewing techniques to characterize the intrinsic dimensionality of datasets: this is the first "experimental" step towards proper theoretical modelling of data. Then, I focus on the problem of finding null models of data in high-dimension: does Euclidean structure survive when the dimensionality of data becomes larger and larger? Finally, I study how geometric data structure alters the expressive potential of simple classifiers.

ASPECTS OF DATA STRUCTURE IN MACHINE LEARNING / V. Erba ; supervisore: S. Caracciolo ; coordinatore: M. Paris. Dipartimento di Fisica Aldo Pontremoli, 2021 Oct 21. 34. ciclo, Anno Accademico 2021. [10.13130/erba-vittorio_phd2021-10-21].

ASPECTS OF DATA STRUCTURE IN MACHINE LEARNING

V. Erba
2021

Abstract

It is widely believed that understanding data structure is a crucial ingredient to push forward our comprehension on how (and why) modern machine learning works. Still, most of the theoretical results we have are obtained under very simplifying assumptions on the structure of the training data. In this Thesis, I review some novel results on the problem of characterizing the geometric structure of datasets and the consequences that this structure has on learning algorithms. I also provide pedagogical introductions to manifold learning, random geometric graphs theory and supervised binary classification. I focus on three different aspects of the problem. First, I spend some time reviewing techniques to characterize the intrinsic dimensionality of datasets: this is the first "experimental" step towards proper theoretical modelling of data. Then, I focus on the problem of finding null models of data in high-dimension: does Euclidean structure survive when the dimensionality of data becomes larger and larger? Finally, I study how geometric data structure alters the expressive potential of simple classifiers.
21-ott-2021
Settore FIS/02 - Fisica Teorica, Modelli e Metodi Matematici
CARACCIOLO, SERGIO
PARIS, MATTEO
Doctoral Thesis
ASPECTS OF DATA STRUCTURE IN MACHINE LEARNING / V. Erba ; supervisore: S. Caracciolo ; coordinatore: M. Paris. Dipartimento di Fisica Aldo Pontremoli, 2021 Oct 21. 34. ciclo, Anno Accademico 2021. [10.13130/erba-vittorio_phd2021-10-21].
File in questo prodotto:
File Dimensione Formato  
phd_unimi_R12359.pdf

accesso aperto

Tipologia: Tesi di dottorato completa
Dimensione 6.16 MB
Formato Adobe PDF
6.16 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/873502
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact