We study an active cluster recovery problem where, given a set of n points and an oracle answering queries like “are these two points in the same cluster?”, the task is to recover exactly all clusters using as few queries as possible. We begin by introducing a simple but general notion of margin between clusters that captures, as special cases, the margins used in previous works, the classic SVM margin, and standard notions of stability for center-based clusterings. Under our margin assumptions we design algorithms that, in a variety of settings, recover all clusters exactly using only O(log n) queries. For R m , we give an algorithm that recovers arbitrary convex clusters, in polynomial time, and with a number of queries that is lower than the best existing algorithm by ⇥(m m ) factors. For general pseudometric spaces, where clusters might not be convex or might not have any notion of shape, we give an algorithm that achieves the O(log n) query bound, and is provably near-optimal as a function of the packing number of the space. Finally, for clusterings realized by binary concept classes, we give a combinatorial characterization of recoverability with O(log n) queries, and we show that, for many concept classes in R m , this characterization is equivalent to our margin condition. Our results show a deep connection between cluster margins and active cluster recoverability.

On Margin-Based Cluster Recovery with Oracle Queries / M. Bressan, N. Cesa Bianchi, S. Lattanzi, A. Paudice (ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS). - In: Advances in Neural Information Processing Systems 34 pre-proceedings (NeurIPS 2021) / [a cura di] M. Ranzato, A. Beygelzimer, K. Nguyen, P.S. Liang, J.W. Vaughan, Y. Dauphin. - [s.l] : Curran Associates, 2021. - pp. 1-13 (( Intervento presentato al 34. convegno Neural Information Processing Systems nel 2021.

On Margin-Based Cluster Recovery with Oracle Queries

M. Bressan;N. Cesa Bianchi;A. Paudice
2021

Abstract

We study an active cluster recovery problem where, given a set of n points and an oracle answering queries like “are these two points in the same cluster?”, the task is to recover exactly all clusters using as few queries as possible. We begin by introducing a simple but general notion of margin between clusters that captures, as special cases, the margins used in previous works, the classic SVM margin, and standard notions of stability for center-based clusterings. Under our margin assumptions we design algorithms that, in a variety of settings, recover all clusters exactly using only O(log n) queries. For R m , we give an algorithm that recovers arbitrary convex clusters, in polynomial time, and with a number of queries that is lower than the best existing algorithm by ⇥(m m ) factors. For general pseudometric spaces, where clusters might not be convex or might not have any notion of shape, we give an algorithm that achieves the O(log n) query bound, and is provably near-optimal as a function of the packing number of the space. Finally, for clusterings realized by binary concept classes, we give a combinatorial characterization of recoverability with O(log n) queries, and we show that, for many concept classes in R m , this characterization is equivalent to our margin condition. Our results show a deep connection between cluster margins and active cluster recoverability.
Settore INF/01 - Informatica
2021
https://proceedings.neurips.cc/paper/2021/file/d46e1fcf4c07ce4a69ee07e4134bcef1-Paper.pdf
Book Part (author)
File in questo prodotto:
File Dimensione Formato  
NeurIPS-2021-on-margin-based-cluster-recovery-with-oracle-queries-Paper.pdf

accesso aperto

Tipologia: Post-print, accepted manuscript ecc. (versione accettata dall'editore)
Dimensione 635.51 kB
Formato Adobe PDF
635.51 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/906089
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 0
social impact