In a standard classification framework a set of trustworthy learning data are employed to build a decision rule, with the final aim of classifying unlabelled units belonging to the test set. Therefore, unreliable labelled observations, namely outliers and data with incorrect labels, can strongly undermine the classifier performance, especially if the training size is small. The present work introduces a robust modification to the Model-Based Classification framework, employing impartial trimming and constraints on the ratio between the maximum and the minimum eigenvalue of the group scatter matrices. The proposed method effectively handles noise presence in both response and exploratory variables, providing reliable classification even when dealing with contaminated datasets. A robust information criterion is proposed for model selection. Experiments on real and simulated data, artificially adulterated, are provided to underline the benefits of the proposed method.

A robust approach to model-based classification based on trimming and constraints: Semi-supervised learning in presence of outliers and label noise / A. Cappozzo, F. Greselin, T.B. Murphy. - In: ADVANCES IN DATA ANALYSIS AND CLASSIFICATION. - ISSN 1862-5347. - 14:2(2020), pp. 327-354. [10.1007/s11634-019-00371-w]

A robust approach to model-based classification based on trimming and constraints: Semi-supervised learning in presence of outliers and label noise

A. Cappozzo
Primo
;
2020

Abstract

In a standard classification framework a set of trustworthy learning data are employed to build a decision rule, with the final aim of classifying unlabelled units belonging to the test set. Therefore, unreliable labelled observations, namely outliers and data with incorrect labels, can strongly undermine the classifier performance, especially if the training size is small. The present work introduces a robust modification to the Model-Based Classification framework, employing impartial trimming and constraints on the ratio between the maximum and the minimum eigenvalue of the group scatter matrices. The proposed method effectively handles noise presence in both response and exploratory variables, providing reliable classification even when dealing with contaminated datasets. A robust information criterion is proposed for model selection. Experiments on real and simulated data, artificially adulterated, are provided to underline the benefits of the proposed method.
Eigenvalues restrictions; Impartial trimming; Label noise; Model-based classification; Outliers detection; Robust estimation
Settore SECS-S/01 - Statistica
2020
Article (author)
File in questo prodotto:
File Dimensione Formato  
Cappozzo, Greselin, Murphy_2020_A robust approach to model-based classification based on trimming and constraints.pdf

accesso riservato

Descrizione: Regular Article
Tipologia: Publisher's version/PDF
Dimensione 2.05 MB
Formato Adobe PDF
2.05 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1030200
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 14
  • ???jsp.display-item.citation.isi??? 10
  • OpenAlex ND
social impact