Most histologic classiﬁcations of ma jor cancers include large heterogeneous classes. Identiﬁcation of clinically relevant subgroups within these classes is among the most important challenges in cancer genomics. Our approach to this challenge is to seek undiscovered subclasses in broad classes, exploiting a potential biological connection between the unclassiﬁed group and known classiﬁcations working for tumors in other organ sites. Statistically, this problem can be thought of as semi-supervised learning, where a known classiﬁcation is exported to help the clustering procedure. The known classiﬁcation is learned from the supervised part of the model and then used as a ﬁlter for selecting a suitable subset of variables able to identify meaningful subgroups of samples in the unsupervised part of the model. From this perspective, the identiﬁed subgroups can be thought of as having the same interpretation as the original ones. Our implementation is a Bayesian parametric model based on Normal Mixtures and amenable to MCMC computing. textitCombinatorial mixtures characterize the set of the a priori assumptions. Combinatorial mixtures names a new more general and ﬂexible class of models for Bayesian parametric inference in which component parameters are allowed to be diﬀerent or equal, and positive mass is put on every possible combination of equalities and inequalities. This is especially critical in interpreting cancer clusters as those may arise from changes in location, scale or correlations, or any of the combinations. The solution is illustrated using data on molecular classiﬁcation of lung cancer, with molecular classes learned in breast cancer.
Integrating supervised and unsupervised learning in genomics applications / V. Edefonti, G. Parmigiani - In: Proc. Valencia / ISBA 8th World Meeting on Bayesian Statistics[s.l] : José M. Bernardo, 2006 Jun 01. - pp. 54-55 (( Intervento presentato al 8. convegno Valencia International Meeting on Bayesian Statistics ; World Meeting of the International Society for Bayesian Analysis tenutosi a Alicante nel 2006.
|Titolo:||Integrating supervised and unsupervised learning in genomics applications|
EDEFONTI, VALERIA CARLA (Primo)
|Parole Chiave:||Bayesian inference ; mixture models ; combinatorial mixtures|
|Settore Scientifico Disciplinare:||Settore MED/01 - Statistica Medica|
|Data di pubblicazione:||1-giu-2006|
|Enti collegati al convegno:||International Society for Bayesian Analysis (ISBA)|
Universitat de Valencia
|Tipologia:||Book Part (author)|
|Appare nelle tipologie:||03 - Contributo in volume|