In correlation clustering, we are givennobjects together with a binary similarityscore between each pair of them. The goal is to partition the objects into clustersso to minimise the disagreements with the scores. In this work we investigatecorrelation clustering as an active learning problem: each similarity score can belearned by making a query, and the goal is to minimise both the disagreementsand the total number of queries. On the one hand, we describe simple activelearning algorithms, which provably achieve an almost optimal trade-off whilegiving cluster recovery guarantees, and we test them on different datasets. On theother hand, we prove information-theoretical bounds on the number of queriesnecessary to guarantee a prescribed disagreement bound. These results give a richcharacterization of the trade-off between queries and clustering error.

Correlation Clustering with Adaptive Similarity Queries / M. Bressan, N. Cesa-Bianchi, A. Paudice, F. Vitale (ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS). - In: Advances in Neural Information Processing Systems / [a cura di] H. Wallach, H. Larochelle, A. Beygelzime, F. d'Alché-Buc, E. Fox, R. Garnett. - [s.l] : Curran Associates, Inc., 2019. - pp. 12510-12519 (( Intervento presentato al 32. convegno Neural Information Processing Systems nel 2019.

Correlation Clustering with Adaptive Similarity Queries

M. Bressan;N. Cesa-Bianchi;A. Paudice;F. Vitale
2019

Abstract

In correlation clustering, we are givennobjects together with a binary similarityscore between each pair of them. The goal is to partition the objects into clustersso to minimise the disagreements with the scores. In this work we investigatecorrelation clustering as an active learning problem: each similarity score can belearned by making a query, and the goal is to minimise both the disagreementsand the total number of queries. On the one hand, we describe simple activelearning algorithms, which provably achieve an almost optimal trade-off whilegiving cluster recovery guarantees, and we test them on different datasets. On theother hand, we prove information-theoretical bounds on the number of queriesnecessary to guarantee a prescribed disagreement bound. These results give a richcharacterization of the trade-off between queries and clustering error.
Settore INF/01 - Informatica
2019
http://papers.nips.cc/paper/9417-correlation-clustering-with-adaptive-similarity-queries.pdf
Book Part (author)
File in questo prodotto:
File Dimensione Formato  
9417-correlation-clustering-with-adaptive-similarity-queries.pdf

accesso aperto

Tipologia: Post-print, accepted manuscript ecc. (versione accettata dall'editore)
Dimensione 392.79 kB
Formato Adobe PDF
392.79 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/691411
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 13
  • ???jsp.display-item.citation.isi??? 1
social impact