This thesis introduces and studies the problem of 1-dimensional bounded clustering: for any fixed p ≥ 1, given reals x1, x2…, xn, and integers k1, k2.., km, determine the partition (A1, A2… Am) of {1, 2, ..., n} with |A1| = k1, |A2| = k2 , … , |Am| = km which minimizes Σk Σi Ak |xi - μk |p where μk is the p-centroid of Ak First, we prove that the optimum partition is contiguous (String Property), that is if i,j  Ak, and xi < xs < xj, then s  Ak . As a consequence, we determine an efficient algorithm for bi-clustering (if p is an integer); however, we show that the general problem is NP-complete, while a relaxed version of it admits a polynomial-time algorithm. When p is not an integer, we prove that the problem of deciding if the centroid μ is less than a given integer is in the Counting Hierarchy CH. As an application, the relaxed clustering algorithm used as a step for solving a problem in the field of Bioinformatics: the Localization of promoter regions in genomic sequences. The results are compared with those obtained through another methodology (MADAP).

PROBLEMI DI CLUSTERING CON VINCOLI: ALGORITMI E COMPLESSITÀ / F. Sacca' ; tutor: A. Bertoni, G. Valentini. Universita' degli Studi di Milano, 2010 Dec 17. 22. ciclo, Anno Accademico 2009. [10.13130/sacca-francesco_phd2010-12-17].

PROBLEMI DI CLUSTERING CON VINCOLI: ALGORITMI E COMPLESSITÀ

F. Sacca'
2010

Abstract

This thesis introduces and studies the problem of 1-dimensional bounded clustering: for any fixed p ≥ 1, given reals x1, x2…, xn, and integers k1, k2.., km, determine the partition (A1, A2… Am) of {1, 2, ..., n} with |A1| = k1, |A2| = k2 , … , |Am| = km which minimizes Σk Σi Ak |xi - μk |p where μk is the p-centroid of Ak First, we prove that the optimum partition is contiguous (String Property), that is if i,j  Ak, and xi < xs < xj, then s  Ak . As a consequence, we determine an efficient algorithm for bi-clustering (if p is an integer); however, we show that the general problem is NP-complete, while a relaxed version of it admits a polynomial-time algorithm. When p is not an integer, we prove that the problem of deciding if the centroid μ is less than a given integer is in the Counting Hierarchy CH. As an application, the relaxed clustering algorithm used as a step for solving a problem in the field of Bioinformatics: the Localization of promoter regions in genomic sequences. The results are compared with those obtained through another methodology (MADAP).
17-dic-2010
Settore INF/01 - Informatica
clustering ; NP-complete
BERTONI, ALBERTO
Doctoral Thesis
PROBLEMI DI CLUSTERING CON VINCOLI: ALGORITMI E COMPLESSITÀ / F. Sacca' ; tutor: A. Bertoni, G. Valentini. Universita' degli Studi di Milano, 2010 Dec 17. 22. ciclo, Anno Accademico 2009. [10.13130/sacca-francesco_phd2010-12-17].
File in questo prodotto:
File Dimensione Formato  
phd_unimi_r07040.pdf

accesso aperto

Tipologia: Tesi di dottorato completa
Dimensione 631.37 kB
Formato Adobe PDF
631.37 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/150055
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact