After being trained on a fully-labeled training set, where the observations are grouped into a certain number of known classes, novelty detection methods aim to classify the instances of an unlabeled test set while allowing for the presence of previously unseen classes. These models are valuable in many areas, ranging from social network and food adulteration analyses to biology, where an evolving population may be present. In this paper, we focus on a two-stage Bayesian semiparametric novelty detector, also known as Brand, recently introduced in the literature. Leveraging on a model-based mixture representation, Brand allows clustering the test observations into known train- ing terms or a single novelty term. Furthermore, the novelty term is modeled with a Dirichlet Process mixture model to flexibly capture any departure from the known pat- terns. Brand was originally estimated using MCMC schemes, which are prohibitively costly when applied to high-dimensional data. To scale up Brand applicability to large datasets, we propose to resort to a variational Bayes approach, providing an efficient algorithm for posterior approximation. We demonstrate a significant gain in efficiency and excellent classification performance with thorough simulation studies. Finally, to showcase its applicability, we perform a novelty detection analysis using the openly- available Statlog dataset, a large collection of satellite imaging spectra, to search for novel soil types.

Variational inference for semiparametric Bayesian novelty detection in large datasets / L. Benedetti, E. Boniardi, L. Chiani, J. Ghirri, M. Mastropietro, A. Cappozzo, F. Denti. - In: ADVANCES IN DATA ANALYSIS AND CLASSIFICATION. - ISSN 1862-5347. - (2023), pp. 1-23. [Epub ahead of print] [10.1007/s11634-023-00569-z]

Variational inference for semiparametric Bayesian novelty detection in large datasets

A. Cappozzo
Penultimo
;
2023

Abstract

After being trained on a fully-labeled training set, where the observations are grouped into a certain number of known classes, novelty detection methods aim to classify the instances of an unlabeled test set while allowing for the presence of previously unseen classes. These models are valuable in many areas, ranging from social network and food adulteration analyses to biology, where an evolving population may be present. In this paper, we focus on a two-stage Bayesian semiparametric novelty detector, also known as Brand, recently introduced in the literature. Leveraging on a model-based mixture representation, Brand allows clustering the test observations into known train- ing terms or a single novelty term. Furthermore, the novelty term is modeled with a Dirichlet Process mixture model to flexibly capture any departure from the known pat- terns. Brand was originally estimated using MCMC schemes, which are prohibitively costly when applied to high-dimensional data. To scale up Brand applicability to large datasets, we propose to resort to a variational Bayes approach, providing an efficient algorithm for posterior approximation. We demonstrate a significant gain in efficiency and excellent classification performance with thorough simulation studies. Finally, to showcase its applicability, we perform a novelty detection analysis using the openly- available Statlog dataset, a large collection of satellite imaging spectra, to search for novel soil types.
No
English
Novelty detection; Dirichlet process; Variational inference; Large datasets; Nested mixtures; Bayesian modeling
Settore SECS-S/01 - Statistica
Articolo
Esperti anonimi
Pubblicazione scientifica
   Assegnazione Dipartimenti di Eccellenza 2023-2027 - Dipartimento di ECONOMIA, MANAGEMENT E METODI QUANTITATIVI
   DECC23_006
   MINISTERO DELL'UNIVERSITA' E DELLA RICERCA
2023
4-dic-2023
Springer Nature
1
23
23
Epub ahead of print
Periodico con rilevanza internazionale
crossref
Aderisco
info:eu-repo/semantics/article
Variational inference for semiparametric Bayesian novelty detection in large datasets / L. Benedetti, E. Boniardi, L. Chiani, J. Ghirri, M. Mastropietro, A. Cappozzo, F. Denti. - In: ADVANCES IN DATA ANALYSIS AND CLASSIFICATION. - ISSN 1862-5347. - (2023), pp. 1-23. [Epub ahead of print] [10.1007/s11634-023-00569-z]
open
Prodotti della ricerca::01 - Articolo su periodico
7
262
Article (author)
Periodico con Impact Factor
L. Benedetti, E. Boniardi, L. Chiani, J. Ghirri, M. Mastropietro, A. Cappozzo, F. Denti
File in questo prodotto:
File Dimensione Formato  
s11634-023-00569-z.pdf

accesso aperto

Descrizione: Regular Article
Tipologia: Publisher's version/PDF
Dimensione 1.42 MB
Formato Adobe PDF
1.42 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1039374
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact