Currently used gene intensity-dependent normalization methods, based on regression smoothing techniques, usually approach the two problems of location bias detrending and data re-scaling without taking into account the censoring characteristic of certain gene expressions produced by experiment measurement constraints or by previous normalization steps. Moreover, the bias vs variance balance control of normalization procedures is not often discussed but left to the user's experience. Here an approximate maximum likelihood procedure to fit a model smoothing the dependences of log-fold gene expression differences on average gene intensities is presented. Central tendency and scaling factor were modeled by means of B-splines smoothing technique. Alternatively to the outliers theory and robust methods, our approach is to look for suitable distributional models, possibly generalizing the classical Gaussian and Laplacian assumption, addressing the problem of censoring. An Information Criterion (AIC) and the Bayesian Information Criterion (BIC) were adopted for model selection. A Monte Carlo evaluation was performed in order to investigate the goodness of fit for the selected models. Randomization quantiles are used to produce normal distributed adjusted data. The analysis was performed on a pre-processed publicly available dataset with censored gene expression data, published in a Breast Cancer microarray study. Results obtained from the different models, suggest that Asymmetric Laplace distribution produce the best fit models. AIC and BIC information criteria advocate models with different flexibility levels for the various arrays; BIC showed tendency to produce more parsimonious best-fitting models. Comparison of model based generated data to observed microarray data indicated reasonable fits for the models evaluated. The proposed approach provides a way to model the distribution of gene expression data as a function of the mean intensity value, controlling for different type of censoring. Information criteria could help avoid the potential systematic distortion caused by a poor bias vs variance balance control. Laplace distribution should be considered in the future parametric error modeling research studies. The proposed approach provides a way to model the distribution of gene expression data as a function of the mean intensity value, controlling for different type of censoring. Information criteria could help avoid the potential systematic distortion caused by a poor bias vs variance balance control. Laplace distribution should be considered in the future parametric error modeling research studies.
Exploration of distributional models for a novel intensity-dependent normalization / N. Lama, P. Boracchi, E. Biganzoli. - In: COBRA preprint series. - 2006:14(2006 Oct 24), pp. 1-40.
Exploration of distributional models for a novel intensity-dependent normalization
N. LamaPrimo
;P. BoracchiSecondo
;E. BiganzoliUltimo
2006
Abstract
Currently used gene intensity-dependent normalization methods, based on regression smoothing techniques, usually approach the two problems of location bias detrending and data re-scaling without taking into account the censoring characteristic of certain gene expressions produced by experiment measurement constraints or by previous normalization steps. Moreover, the bias vs variance balance control of normalization procedures is not often discussed but left to the user's experience. Here an approximate maximum likelihood procedure to fit a model smoothing the dependences of log-fold gene expression differences on average gene intensities is presented. Central tendency and scaling factor were modeled by means of B-splines smoothing technique. Alternatively to the outliers theory and robust methods, our approach is to look for suitable distributional models, possibly generalizing the classical Gaussian and Laplacian assumption, addressing the problem of censoring. An Information Criterion (AIC) and the Bayesian Information Criterion (BIC) were adopted for model selection. A Monte Carlo evaluation was performed in order to investigate the goodness of fit for the selected models. Randomization quantiles are used to produce normal distributed adjusted data. The analysis was performed on a pre-processed publicly available dataset with censored gene expression data, published in a Breast Cancer microarray study. Results obtained from the different models, suggest that Asymmetric Laplace distribution produce the best fit models. AIC and BIC information criteria advocate models with different flexibility levels for the various arrays; BIC showed tendency to produce more parsimonious best-fitting models. Comparison of model based generated data to observed microarray data indicated reasonable fits for the models evaluated. The proposed approach provides a way to model the distribution of gene expression data as a function of the mean intensity value, controlling for different type of censoring. Information criteria could help avoid the potential systematic distortion caused by a poor bias vs variance balance control. Laplace distribution should be considered in the future parametric error modeling research studies. The proposed approach provides a way to model the distribution of gene expression data as a function of the mean intensity value, controlling for different type of censoring. Information criteria could help avoid the potential systematic distortion caused by a poor bias vs variance balance control. Laplace distribution should be considered in the future parametric error modeling research studies.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.