The term ‘combinatorial mixtures’ refers to a flexible class of models for inference on mixture distributions whose components have multidimensional parameters. The idea behind it is to allow each element of the component-specific parameter vector to be shared by a subset of other components. We develop Bayesian inference and computational approaches for this class of mixture distributions with an unknown number of components. We define the structure for a general prior distribution – a mixture of prior distributions itself - where a positive probability is put on every possible combination of sharing patterns. This partial sharing allows for generality and flexibility in comparison with traditional approaches to mixture modeling, while still allowing to assign significant mass to models that are more parsimonious than the general ‘no sharing’ case. We illustrate our combinatorial mixtures in an application based on the normal mixture model for any number of components. We introduce normal mixture models fpr univariate and bivariate data, which are amenable to Markov Chain Monte Carlo computing. In the light of combinatorial mixtures, we assume a decomposition of the variance-covariance matrix, which separates out standard deviations and correlations, and thus allows us to model those parameters separately. Moreover, to provide valid posterior estimates of the parameters, we introduce a novel solution to the well-known ‘label switching’ problem and we compare it with the existing ones. This development was originally motivated by applications in molecular biology, where one deals with continuous measures, such as RNA levels, or protein levels, that vary across unknown biological subtypes. In some cases, subtypes are characterized by an increase in the level of the marker measured, while in others they are characterized by variability in otherwise tightly controlled processes, or by the presence of otherwise weak correlations. Also, several mechanisms can coexist. It may also allow to model an interesting phenomenon observed in microarray analysis when two variables have the same mean and variance but opposite correlations in diseased and normal samples. We use data on molecular classification of lung cancer from the web-based information supporting the published manuscript Garber et al. (2001).
Combinatorial mixtures of multiparameter distributions: an application to microarray data / V. Edefonti, G. Parmigiani. ((Intervento presentato al convegno Joint Meeting of the International Biometric Society (IBS) Austro-Swiss and Italian Regions tenutosi a Milano nel 2015.
Combinatorial mixtures of multiparameter distributions: an application to microarray data
V. EdefontiPrimo
;
2015
Abstract
The term ‘combinatorial mixtures’ refers to a flexible class of models for inference on mixture distributions whose components have multidimensional parameters. The idea behind it is to allow each element of the component-specific parameter vector to be shared by a subset of other components. We develop Bayesian inference and computational approaches for this class of mixture distributions with an unknown number of components. We define the structure for a general prior distribution – a mixture of prior distributions itself - where a positive probability is put on every possible combination of sharing patterns. This partial sharing allows for generality and flexibility in comparison with traditional approaches to mixture modeling, while still allowing to assign significant mass to models that are more parsimonious than the general ‘no sharing’ case. We illustrate our combinatorial mixtures in an application based on the normal mixture model for any number of components. We introduce normal mixture models fpr univariate and bivariate data, which are amenable to Markov Chain Monte Carlo computing. In the light of combinatorial mixtures, we assume a decomposition of the variance-covariance matrix, which separates out standard deviations and correlations, and thus allows us to model those parameters separately. Moreover, to provide valid posterior estimates of the parameters, we introduce a novel solution to the well-known ‘label switching’ problem and we compare it with the existing ones. This development was originally motivated by applications in molecular biology, where one deals with continuous measures, such as RNA levels, or protein levels, that vary across unknown biological subtypes. In some cases, subtypes are characterized by an increase in the level of the marker measured, while in others they are characterized by variability in otherwise tightly controlled processes, or by the presence of otherwise weak correlations. Also, several mechanisms can coexist. It may also allow to model an interesting phenomenon observed in microarray analysis when two variables have the same mean and variance but opposite correlations in diseased and normal samples. We use data on molecular classification of lung cancer from the web-based information supporting the published manuscript Garber et al. (2001).File | Dimensione | Formato | |
---|---|---|---|
Iroes-2015-Edefonti-Parmigiani.pdf
accesso riservato
Descrizione: Abstract
Tipologia:
Altro
Dimensione
55.83 kB
Formato
Adobe PDF
|
55.83 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.