We study the cluster recovery problem in the semi-supervised active clustering framework. Given a finite set of input points, and an oracle revealing whether any two points lie in the same cluster, our goal is to recover all clusters exactly using as few queries as possible. To this end, we relax the spherical k-means cluster assumption of Ashtiani et al. to allow for arbitrary ellipsoidal clusters with margin. This removes the assumption that the clustering is center-based (i.e., defined through an optimization problem), and includes all those cases where spherical clusters are individually transformed by any combination of rotations, axis scalings, and point deletions. We show that, even in this much more general setting, it is still possible to recover the latent clustering exactly using a number of queries that scales only logarithmically with the number of input points. More precisely, we design an algorithm that, given n points to be partitioned into k clusters, uses O(k 3 ln k ln n) oracle queries and Oe(kn + k 3 ) time to recover the clustering with zero misclassification error. The O(·) notation hides an exponential dependence on the dimensionality of the clusters, which we show to be necessary thus characterizing the query complexity of the problem. Our algorithm is simple, easy to implement, and can also learn the clusters using low-stretch separators, a class of ellipsoids with additional theoretical guarantees. Experiments on large synthetic datasets confirm that we can reconstruct clusterings exactly and efficiently.

Exact Recovery of Mangled Clusters with Same-Cluster Queries / M. Bressan, N. Cesa Bianchi, S. Lattanzi, A. Paudice - In: Advances in Neural Information Processing Systems / [a cura di] H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, H. Lin. - [s.l] : Curran Associates, 2020. - pp. 9324-9334 (( Intervento presentato al 34. convegno Conference on Neural Information Processing Systems (NeurIPS 2020) tenutosi a Vancouver nel 2020.

Exact Recovery of Mangled Clusters with Same-Cluster Queries

M. Bressan;N. Cesa Bianchi;A. Paudice
2020

Abstract

We study the cluster recovery problem in the semi-supervised active clustering framework. Given a finite set of input points, and an oracle revealing whether any two points lie in the same cluster, our goal is to recover all clusters exactly using as few queries as possible. To this end, we relax the spherical k-means cluster assumption of Ashtiani et al. to allow for arbitrary ellipsoidal clusters with margin. This removes the assumption that the clustering is center-based (i.e., defined through an optimization problem), and includes all those cases where spherical clusters are individually transformed by any combination of rotations, axis scalings, and point deletions. We show that, even in this much more general setting, it is still possible to recover the latent clustering exactly using a number of queries that scales only logarithmically with the number of input points. More precisely, we design an algorithm that, given n points to be partitioned into k clusters, uses O(k 3 ln k ln n) oracle queries and Oe(kn + k 3 ) time to recover the clustering with zero misclassification error. The O(·) notation hides an exponential dependence on the dimensionality of the clusters, which we show to be necessary thus characterizing the query complexity of the problem. Our algorithm is simple, easy to implement, and can also learn the clusters using low-stretch separators, a class of ellipsoids with additional theoretical guarantees. Experiments on large synthetic datasets confirm that we can reconstruct clusterings exactly and efficiently.
Settore INF/01 - Informatica
https://proceedings.neurips.cc/paper/2020/file/6950aa02ae8613af620668146dd11840-Paper.pdf
Book Part (author)
File in questo prodotto:
File Dimensione Formato  
NeurIPS-2020-exact-recovery-of-mangled-clusters-with-same-cluster-queries-Paper.pdf

accesso aperto

Tipologia: Post-print, accepted manuscript ecc. (versione accettata dall'editore)
Dimensione 703.39 kB
Formato Adobe PDF
703.39 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

Caricamento pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/815841
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact