In correlation clustering, we are givennobjects together with a binary similarityscore between each pair of them. The goal is to partition the objects into clustersso to minimise the disagreements with the scores. In this work we investigatecorrelation clustering as an active learning problem: each similarity score can belearned by making a query, and the goal is to minimise both the disagreementsand the total number of queries. On the one hand, we describe simple activelearning algorithms, which provably achieve an almost optimal trade-off whilegiving cluster recovery guarantees, and we test them on different datasets. On theother hand, we prove information-theoretical bounds on the number of queriesnecessary to guarantee a prescribed disagreement bound. These results give a richcharacterization of the trade-off between queries and clustering error.
Correlation Clustering with Adaptive Similarity Queries / M. Bressan, N. Cesa-Bianchi, A. Paudice, F. Vitale (ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS). - In: Advances in Neural Information Processing Systems / [a cura di] H. Wallach, H. Larochelle, A. Beygelzime, F. d'Alché-Buc, E. Fox, R. Garnett. - [s.l] : Curran Associates, Inc., 2019. - pp. 12510-12519 (( Intervento presentato al 32. convegno Neural Information Processing Systems nel 2019.
Correlation Clustering with Adaptive Similarity Queries
M. Bressan;N. Cesa-Bianchi;A. Paudice;F. Vitale
2019
Abstract
In correlation clustering, we are givennobjects together with a binary similarityscore between each pair of them. The goal is to partition the objects into clustersso to minimise the disagreements with the scores. In this work we investigatecorrelation clustering as an active learning problem: each similarity score can belearned by making a query, and the goal is to minimise both the disagreementsand the total number of queries. On the one hand, we describe simple activelearning algorithms, which provably achieve an almost optimal trade-off whilegiving cluster recovery guarantees, and we test them on different datasets. On theother hand, we prove information-theoretical bounds on the number of queriesnecessary to guarantee a prescribed disagreement bound. These results give a richcharacterization of the trade-off between queries and clustering error.File | Dimensione | Formato | |
---|---|---|---|
9417-correlation-clustering-with-adaptive-similarity-queries.pdf
accesso aperto
Tipologia:
Post-print, accepted manuscript ecc. (versione accettata dall'editore)
Dimensione
392.79 kB
Formato
Adobe PDF
|
392.79 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.