This thesis investigates the following general constrained clustering problem: given a dimension $d$, an $L^p$-norm, a set $X\subset\R^d$, a positive integer $k$ and a finite set $\mathcal M\subset\N$, find the optimal $k$-partition $\{A_1,...,A_k\}$ of $X$ w.r.t. the $L^p$-norm satisfying $|A_i|\in \mathcal M$, $i=1,...,k$. First of all, we prove that the problem is NP-hard even if $k=2$ (for all $p>1$), or $d=2$ and $|\mathcal M|=2$ (with Euclidean norm). Moreover, we put in evidence that the problem is computationally hard if $p$ is a non-integer rational. When $d=2$, $k=2$ and $\mathcal M=\{m,n-m\}$, we design an algorithm for solving the problem in time $O(n\sqrt[3]m \log^2 n)$ in the case of Euclidean norm; this result relies on combinatorial geometry techniques concerning $k$-sets and dynamic convex hulls. Finally, we study the problem in fixed dimension $d$ with $k=2$; by means of tools of real algebraic geometry and numerical techniques for localising algebraic roots we construct a polynomial-time method for solving the constrained clustering problem with integer $p$ given in unary notation.

EXACT ALGORITHMS FOR SIZE CONSTRAINED CLUSTERING / J. Lin ; supervisor: A. Bertoni ; ph.d. program coordinator: G. Naldi. - : . Universita' degli Studi di Milano, 2012 Mar 09. ((23. ciclo, Anno Accademico 2010. [10.13130/lin-jianyi_phd2012-03-09].

EXACT ALGORITHMS FOR SIZE CONSTRAINED CLUSTERING

Abstract

This thesis investigates the following general constrained clustering problem: given a dimension $d$, an $L^p$-norm, a set $X\subset\R^d$, a positive integer $k$ and a finite set $\mathcal M\subset\N$, find the optimal $k$-partition $\{A_1,...,A_k\}$ of $X$ w.r.t. the $L^p$-norm satisfying $|A_i|\in \mathcal M$, $i=1,...,k$. First of all, we prove that the problem is NP-hard even if $k=2$ (for all $p>1$), or $d=2$ and $|\mathcal M|=2$ (with Euclidean norm). Moreover, we put in evidence that the problem is computationally hard if $p$ is a non-integer rational. When $d=2$, $k=2$ and $\mathcal M=\{m,n-m\}$, we design an algorithm for solving the problem in time $O(n\sqrt[3]m \log^2 n)$ in the case of Euclidean norm; this result relies on combinatorial geometry techniques concerning $k$-sets and dynamic convex hulls. Finally, we study the problem in fixed dimension $d$ with $k=2$; by means of tools of real algebraic geometry and numerical techniques for localising algebraic roots we construct a polynomial-time method for solving the constrained clustering problem with integer $p$ given in unary notation.
Scheda breve Scheda completa Scheda completa (DC)
BERTONI, ALBERTO
NALDI, GIOVANNI
clustering ; size contraints ; NP-hardness ; p-norm ; polynomial-time ; k-set ; dynamic convex hull ; cylindrical algebraic decomposition
Settore INF/01 - Informatica
Settore MAT/03 - Geometria
EXACT ALGORITHMS FOR SIZE CONSTRAINED CLUSTERING / J. Lin ; supervisor: A. Bertoni ; ph.d. program coordinator: G. Naldi. - : . Universita' degli Studi di Milano, 2012 Mar 09. ((23. ciclo, Anno Accademico 2010. [10.13130/lin-jianyi_phd2012-03-09].
Doctoral Thesis
File in questo prodotto:
File
phd_unimi_R07628.pdf

embargo fino al 21/09/2013

Tipologia: Tesi di dottorato completa
Dimensione 573.61 kB
Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/2434/172513