In contrast to single-view learning, multi-view learning trains simultaneously distinct algorithms on disjoint subsets of features (the views), and jointly optimizes them, so that they come to a consensus. Multi-view learning is typically used when the data are described by a large number of features. It aims at exploiting the different statistical properties of distinct views. A task to be performed before multi-view learning - in the case where the features have no natural groupings - is multi-view generation (MVG): it consists in partitioning the feature set in subsets (views) characterized by some desired properties. Given a dataset, in the form of a table with a large number of columns, the desired solution of the MVG problem is a partition of the columns that optimizes an objective function, encoding typical requirements. If the class labels are available, one wants to minimize the inter-view redundancy in target prediction and maximize consistency. If the class labels are not available, one wants simply to minimize inter-view redundancy (minimize the information each view has about the others). In this work, we approach the MVG problem in the latter, unsupervised, setting. Our approach is based on the transposition of the data table: the original instance rows are mapped into columns (the 'pseudo-features'), while the original feature columns become rows (the 'pseudo-instances'). The latter can then be partitioned by any suitable standard instance-partitioning algorithm: the resulting groups can be considered as groups of the original features, i.e. views, solution of the MVG problem. We demonstrate the approach using k-means and the standard benchmark MNIST dataset of handwritten digits.

K-Means Clustering in Dual Space for Unsupervised Feature Partitioning in Multi-view Learning / C. Mio, G. Gianini, E. Damiani - In: 2018 14th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS) / [a cura di] G.S. DiBaja, L. Gallo, K. Yetongnon, A. Dipanda, M. CastrillonSantana, R. Chbeir. - [s.l] : IEEE, 2019. - ISBN 9781538693858. - pp. 1-8 (( Intervento presentato al 14. convegno International Conference on Signal Image Technology & Internet Based Systems (SITIS) tenutosi a Las Palmas de Gran Canaria nel 2018 [10.1109/SITIS.2018.00012].

K-Means Clustering in Dual Space for Unsupervised Feature Partitioning in Multi-view Learning

C. Mio;G. Gianini;E. Damiani
2019

Abstract

In contrast to single-view learning, multi-view learning trains simultaneously distinct algorithms on disjoint subsets of features (the views), and jointly optimizes them, so that they come to a consensus. Multi-view learning is typically used when the data are described by a large number of features. It aims at exploiting the different statistical properties of distinct views. A task to be performed before multi-view learning - in the case where the features have no natural groupings - is multi-view generation (MVG): it consists in partitioning the feature set in subsets (views) characterized by some desired properties. Given a dataset, in the form of a table with a large number of columns, the desired solution of the MVG problem is a partition of the columns that optimizes an objective function, encoding typical requirements. If the class labels are available, one wants to minimize the inter-view redundancy in target prediction and maximize consistency. If the class labels are not available, one wants simply to minimize inter-view redundancy (minimize the information each view has about the others). In this work, we approach the MVG problem in the latter, unsupervised, setting. Our approach is based on the transposition of the data table: the original instance rows are mapped into columns (the 'pseudo-features'), while the original feature columns become rows (the 'pseudo-instances'). The latter can then be partitioned by any suitable standard instance-partitioning algorithm: the resulting groups can be considered as groups of the original features, i.e. views, solution of the MVG problem. We demonstrate the approach using k-means and the standard benchmark MNIST dataset of handwritten digits.
Multi-view learning; k-means; dual space clustering; consensus clustering; bagging
Settore INF/01 - Informatica
Settore ING-INF/05 - Sistemi di Elaborazione delle Informazioni
   TrustwOrthy model-awaRE Analytics Data platfORm
   TOREADOR
   EUROPEAN COMMISSION
   H2020
   688797

   EVidenced based management of hearing impairments: Public health p?licy making based on fusing big data analytics and simulaTION
   EVOTION
   EUROPEAN COMMISSION
   H2020
   727521

   THREAT-ARREST Cyber Security Threats and Threat Actors Training - Assurance Driven Multi-Layer, end-to-end Simulation and Training (THREAT-ARREST)
   THREAT-ARREST
   EUROPEAN COMMISSION
   H2020
   786890
2019
Book Part (author)
File in questo prodotto:
File Dimensione Formato  
PID5647677.pdf

accesso riservato

Tipologia: Post-print, accepted manuscript ecc. (versione accettata dall'editore)
Dimensione 2.38 MB
Formato Adobe PDF
2.38 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
08706154.pdf

accesso aperto

Tipologia: Publisher's version/PDF
Dimensione 578.13 kB
Formato Adobe PDF
578.13 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/663182
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 1
social impact