This thesis investigates the following general constrained clustering problem: given a dimension $d$, an $L^p$-norm, a set $X\subset\R^d$, a positive integer $k$ and a finite set $\mathcal M\subset\N$, find the optimal $k$-partition $\{A_1,...,A_k\}$ of $X$ w.r.t. the $L^p$-norm satisfying $|A_i|\in \mathcal M$, $i=1,...,k$. First of all, we prove that the problem is NP-hard even if $k=2$ (for all $p>1$), or $d=2$ and $|\mathcal M|=2$ (with Euclidean norm). Moreover, we put in evidence that the problem is computationally hard if $p$ is a non-integer rational. When $d=2$, $k=2$ and $\mathcal M=\{m,n-m\}$, we design an algorithm for solving the problem in time $O(n\sqrt[3]m \log^2 n)$ in the case of Euclidean norm; this result relies on combinatorial geometry techniques concerning $k$-sets and dynamic convex hulls. Finally, we study the problem in fixed dimension $d$ with $k=2$; by means of tools of real algebraic geometry and numerical techniques for localising algebraic roots we construct a polynomial-time method for solving the constrained clustering problem with integer $p$ given in unary notation.

EXACT ALGORITHMS FOR SIZE CONSTRAINED CLUSTERING / J. Lin ; supervisor: A. Bertoni ; ph.d. program coordinator: G. Naldi. Universita' degli Studi di Milano, 2012 Mar 09. 23. ciclo, Anno Accademico 2010. [10.13130/lin-jianyi_phd2012-03-09].

EXACT ALGORITHMS FOR SIZE CONSTRAINED CLUSTERING

J. Lin
2012

Abstract

This thesis investigates the following general constrained clustering problem: given a dimension $d$, an $L^p$-norm, a set $X\subset\R^d$, a positive integer $k$ and a finite set $\mathcal M\subset\N$, find the optimal $k$-partition $\{A_1,...,A_k\}$ of $X$ w.r.t. the $L^p$-norm satisfying $|A_i|\in \mathcal M$, $i=1,...,k$. First of all, we prove that the problem is NP-hard even if $k=2$ (for all $p>1$), or $d=2$ and $|\mathcal M|=2$ (with Euclidean norm). Moreover, we put in evidence that the problem is computationally hard if $p$ is a non-integer rational. When $d=2$, $k=2$ and $\mathcal M=\{m,n-m\}$, we design an algorithm for solving the problem in time $O(n\sqrt[3]m \log^2 n)$ in the case of Euclidean norm; this result relies on combinatorial geometry techniques concerning $k$-sets and dynamic convex hulls. Finally, we study the problem in fixed dimension $d$ with $k=2$; by means of tools of real algebraic geometry and numerical techniques for localising algebraic roots we construct a polynomial-time method for solving the constrained clustering problem with integer $p$ given in unary notation.
9-mar-2012
Settore INF/01 - Informatica
Settore MAT/03 - Geometria
clustering ; size contraints ; NP-hardness ; p-norm ; polynomial-time ; k-set ; dynamic convex hull ; cylindrical algebraic decomposition
BERTONI, ALBERTO
NALDI, GIOVANNI
Doctoral Thesis
EXACT ALGORITHMS FOR SIZE CONSTRAINED CLUSTERING / J. Lin ; supervisor: A. Bertoni ; ph.d. program coordinator: G. Naldi. Universita' degli Studi di Milano, 2012 Mar 09. 23. ciclo, Anno Accademico 2010. [10.13130/lin-jianyi_phd2012-03-09].
File in questo prodotto:
File Dimensione Formato  
phd_unimi_R07628.pdf

Open Access dal 22/09/2013

Tipologia: Tesi di dottorato completa
Dimensione 573.61 kB
Formato Adobe PDF
573.61 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/172513
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact