Most histologic classifications of ma jor cancers include large heterogeneous classes. Identification of clinically relevant subgroups within these classes is among the most important challenges in cancer genomics. Our approach to this challenge is to seek undiscovered subclasses in broad classes, exploiting a potential biological connection between the unclassified group and known classifications working for tumors in other organ sites. Statistically, this problem can be thought of as semi-supervised learning, where a known classification is exported to help the clustering procedure. The known classification is learned from the supervised part of the model and then used as a filter for selecting a suitable subset of variables able to identify meaningful subgroups of samples in the unsupervised part of the model. From this perspective, the identified subgroups can be thought of as having the same interpretation as the original ones. Our implementation is a Bayesian parametric model based on Normal Mixtures and amenable to MCMC computing. textitCombinatorial mixtures characterize the set of the a priori assumptions. Combinatorial mixtures names a new more general and flexible class of models for Bayesian parametric inference in which component parameters are allowed to be different or equal, and positive mass is put on every possible combination of equalities and inequalities. This is especially critical in interpreting cancer clusters as those may arise from changes in location, scale or correlations, or any of the combinations. The solution is illustrated using data on molecular classification of lung cancer, with molecular classes learned in breast cancer.
Integrating supervised and unsupervised learning in genomics applications / V. Edefonti, G. Parmigiani - In: Proc. Valencia / ISBA 8th World Meeting on Bayesian Statistics[s.l] : José M. Bernardo, 2006 Jun 01. - pp. 54-55 (( Intervento presentato al 8. convegno Valencia International Meeting on Bayesian Statistics ; World Meeting of the International Society for Bayesian Analysis tenutosi a Alicante nel 2006.
Integrating supervised and unsupervised learning in genomics applications
V. EdefontiPrimo
;
2006
Abstract
Most histologic classifications of ma jor cancers include large heterogeneous classes. Identification of clinically relevant subgroups within these classes is among the most important challenges in cancer genomics. Our approach to this challenge is to seek undiscovered subclasses in broad classes, exploiting a potential biological connection between the unclassified group and known classifications working for tumors in other organ sites. Statistically, this problem can be thought of as semi-supervised learning, where a known classification is exported to help the clustering procedure. The known classification is learned from the supervised part of the model and then used as a filter for selecting a suitable subset of variables able to identify meaningful subgroups of samples in the unsupervised part of the model. From this perspective, the identified subgroups can be thought of as having the same interpretation as the original ones. Our implementation is a Bayesian parametric model based on Normal Mixtures and amenable to MCMC computing. textitCombinatorial mixtures characterize the set of the a priori assumptions. Combinatorial mixtures names a new more general and flexible class of models for Bayesian parametric inference in which component parameters are allowed to be different or equal, and positive mass is put on every possible combination of equalities and inequalities. This is especially critical in interpreting cancer clusters as those may arise from changes in location, scale or correlations, or any of the combinations. The solution is illustrated using data on molecular classification of lung cancer, with molecular classes learned in breast cancer.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.