Introduction In pooled analyses, where covariates may not be uniformly defined and coded across studies, and occasionally not measured in all of them, a joint model for aggregated data (one-stage approach) is often not feasible, and a two-stage approach (see [1] for instance) is a simple, valid and practical method for the analysis, lending itself to flexibility with respect to differences in design, confounders and data collection across studies and to a better control for confounding. Simulations indicate that, when the individual studies are large, two-stage methods produce nearly unbiased exposure estimates and standard errors of the exposure estimates of the one-stage methods via generalized linear mixed models (GLMMs). Based on these considerations and on other computational issues, several existing cancer epidemiology consortia suggested in their protocols to apply two-stage methods instead of fitting GLMMs directly on the overall sample (see [2] for instance), although they still struggle to harmonize exposure variables of interest and potential confounders. However, it is unclear how well the two-stage method would perform if individual studies were smaller, especially when there are a few of them [3]. This may be a critical issue, especially as far as evidence has been accumulating on the major research questions a consortium was born for, and time is mature for more specific analyses on subgroups of studies. The International Head and Neck Cancer Epidemiology (INHANCE) consortium was established in 2004 to contribute elucidating the aetiology of head and neck cancer by providing opportunities for pooled analyses of individual-level data on a large scale. In the current version, the consortium included 35 case-control studies, with questionnaire data on over 26,000 cases and 34,000 controls [4]. Only 10 studies in the consortium provided detailed information on nutrient intakes. The available studies differ in terms of number of subjects included, geographical region, and assessment of dietary information. This naturally provides a real-life situation of interest to understand how big is the difference, if any, in the effect estimates derived from the two-stage methods and the GLMMs. We discuss this point in an application on vitamin C intake and head and neck cancer [5]. This research question may have important effects in the refinement of the statistical methods supporting the recent global effort towards pooling individual-level data in consortia. Objectives In detail, our aim is to use a specific real-life situation of interest to address the following aspects of the problem: a) in the presence of heterogeneity among studies is the popular two-stage approach ‘always’ the right solution? b) how to identify an alternative suitable one-stage approach? c) how to fairly compare the two approaches? Methods To isolate the pure effect of vitamin C intake from that of non-alcohol energy intake and to improve comparability of nutrient intakes across studies, we preprocessed available study-specific information on vitamin C intake from natural sources from the INHANCE consortium applying the Willett and Stampfer residual method within each study. We identified quintile categories of exposure, defined on both cases and controls, and compared each one to the lowest category of intake. The odds ratios (ORs) of the association between vitamin C intake and two head and neck cancer separate outcomes (oral and pharyngeal cancer combined and laryngeal cancer) were derived from several (fixed- and random-effects) models for one-stage and two-stage methods, including, for the former type, multiple logistic regression models and GLMMs with random intercept or random intercept/random slope and, for the latter type, standard univariate meta-analysis with fixed- or random-effects (with several combinations of estimation methods and variance-covariance structures) and multivariate meta-analysis with random-effects, which considered all quintile categories of exposure in the same meta-analysis. Heterogeneity between studies was also tested, according to the different approaches under consideration. Results Heterogeneity between studies was present for both cancer sites, considering either separate quintile categories of exposure (as compared to the reference one) or all the quintile categories simultaneously in the same regression model/meta-analysis. Accordingly, in the following, we present results from random-effects models. For oral and pharyngeal cancer combined, the ORs and corresponding 95% confidence intervals (CIs) were similar across the two approaches and the different available options, showing an inverse and significant association with higher intakes of vitamin C (significant ORs ranging from 0.54 to 0.58, for the highest quintile category vs the lowest one). However, GLMMs showed a promising tendency to narrower CIs in the highest quintile categories of exposure. For laryngeal cancer, GLMMs provided the smallest ORs (0.50-0.52 vs 0.69-0.73 for the five alternative versions of the two-stage approaches) with the narrowest CIs, and led to the conclusion of a significant protective effect of vitamin C from natural sources on laryngeal cancer. Two-stage approaches pointed to the same message, but with no indication of statistical significance. Conclusions Consortia are becoming increasingly popular in epidemiology and statistical methods for consortia should be clearly described and continuously updated. Two-stage approaches may not always be the right solution for the analysis of individual-level pooled data. Indeed, we showed that, when there are a few studies, with a different number of participants, and the exposure has more than two categories, a one-stage approach based on GLMMs may be effectively implemented and should be suggested, as compared to the traditional two-stage random-effects approach. References 1. DerSimonian R, Laird N, Meta-analysis in clinical trials. Controlled Clinical Trials 1986;7:177–188. 2. Raimondi S, Gandini S et al., Melanocortin-1 receptor, skin cancer and phenotypic characteristics (M-SKIP) project: study design and methods for pooling results of genetic epidemiological studies. BMC Med Res Methodol. 2012;12-116. 3. Stukel TA, Demidenko E, et al., Two-stage methods for the analysis of pooled data. Stat Med 2001;20(14):2115–2120. 4. Conway DI, et al., Enhancing epidemiologic research on head and neck cancer: INHANCE - The international head and neck cancer epidemiology consortium. Oral Oncol. 2009;45(9):743-6. 5. Edefonti V, et al., Natural vitamin C intake and the risk of head and neck cancer: A pooled analysis in the international head and neck cancer epidemiology consortium. Int J Cancer. 2015 Jul 15;137(2):448-62.

One-stage versus two-stage approaches in individual-level pooled data analysis / V. Edefonti, M. Ferraroni. ((Intervento presentato al 8. convegno Orizzonti 2020 per la Biostatistica e l’Epidemiologia Clinica: Sfide e Opportunità nell’Era dei Big Data tenutosi a Torino nel 2015.

One-stage versus two-stage approaches in individual-level pooled data analysis

V. Edefonti
Primo
;
M. Ferraroni
Ultimo
2015

Abstract

Introduction In pooled analyses, where covariates may not be uniformly defined and coded across studies, and occasionally not measured in all of them, a joint model for aggregated data (one-stage approach) is often not feasible, and a two-stage approach (see [1] for instance) is a simple, valid and practical method for the analysis, lending itself to flexibility with respect to differences in design, confounders and data collection across studies and to a better control for confounding. Simulations indicate that, when the individual studies are large, two-stage methods produce nearly unbiased exposure estimates and standard errors of the exposure estimates of the one-stage methods via generalized linear mixed models (GLMMs). Based on these considerations and on other computational issues, several existing cancer epidemiology consortia suggested in their protocols to apply two-stage methods instead of fitting GLMMs directly on the overall sample (see [2] for instance), although they still struggle to harmonize exposure variables of interest and potential confounders. However, it is unclear how well the two-stage method would perform if individual studies were smaller, especially when there are a few of them [3]. This may be a critical issue, especially as far as evidence has been accumulating on the major research questions a consortium was born for, and time is mature for more specific analyses on subgroups of studies. The International Head and Neck Cancer Epidemiology (INHANCE) consortium was established in 2004 to contribute elucidating the aetiology of head and neck cancer by providing opportunities for pooled analyses of individual-level data on a large scale. In the current version, the consortium included 35 case-control studies, with questionnaire data on over 26,000 cases and 34,000 controls [4]. Only 10 studies in the consortium provided detailed information on nutrient intakes. The available studies differ in terms of number of subjects included, geographical region, and assessment of dietary information. This naturally provides a real-life situation of interest to understand how big is the difference, if any, in the effect estimates derived from the two-stage methods and the GLMMs. We discuss this point in an application on vitamin C intake and head and neck cancer [5]. This research question may have important effects in the refinement of the statistical methods supporting the recent global effort towards pooling individual-level data in consortia. Objectives In detail, our aim is to use a specific real-life situation of interest to address the following aspects of the problem: a) in the presence of heterogeneity among studies is the popular two-stage approach ‘always’ the right solution? b) how to identify an alternative suitable one-stage approach? c) how to fairly compare the two approaches? Methods To isolate the pure effect of vitamin C intake from that of non-alcohol energy intake and to improve comparability of nutrient intakes across studies, we preprocessed available study-specific information on vitamin C intake from natural sources from the INHANCE consortium applying the Willett and Stampfer residual method within each study. We identified quintile categories of exposure, defined on both cases and controls, and compared each one to the lowest category of intake. The odds ratios (ORs) of the association between vitamin C intake and two head and neck cancer separate outcomes (oral and pharyngeal cancer combined and laryngeal cancer) were derived from several (fixed- and random-effects) models for one-stage and two-stage methods, including, for the former type, multiple logistic regression models and GLMMs with random intercept or random intercept/random slope and, for the latter type, standard univariate meta-analysis with fixed- or random-effects (with several combinations of estimation methods and variance-covariance structures) and multivariate meta-analysis with random-effects, which considered all quintile categories of exposure in the same meta-analysis. Heterogeneity between studies was also tested, according to the different approaches under consideration. Results Heterogeneity between studies was present for both cancer sites, considering either separate quintile categories of exposure (as compared to the reference one) or all the quintile categories simultaneously in the same regression model/meta-analysis. Accordingly, in the following, we present results from random-effects models. For oral and pharyngeal cancer combined, the ORs and corresponding 95% confidence intervals (CIs) were similar across the two approaches and the different available options, showing an inverse and significant association with higher intakes of vitamin C (significant ORs ranging from 0.54 to 0.58, for the highest quintile category vs the lowest one). However, GLMMs showed a promising tendency to narrower CIs in the highest quintile categories of exposure. For laryngeal cancer, GLMMs provided the smallest ORs (0.50-0.52 vs 0.69-0.73 for the five alternative versions of the two-stage approaches) with the narrowest CIs, and led to the conclusion of a significant protective effect of vitamin C from natural sources on laryngeal cancer. Two-stage approaches pointed to the same message, but with no indication of statistical significance. Conclusions Consortia are becoming increasingly popular in epidemiology and statistical methods for consortia should be clearly described and continuously updated. Two-stage approaches may not always be the right solution for the analysis of individual-level pooled data. Indeed, we showed that, when there are a few studies, with a different number of participants, and the exposure has more than two categories, a one-stage approach based on GLMMs may be effectively implemented and should be suggested, as compared to the traditional two-stage random-effects approach. References 1. DerSimonian R, Laird N, Meta-analysis in clinical trials. Controlled Clinical Trials 1986;7:177–188. 2. Raimondi S, Gandini S et al., Melanocortin-1 receptor, skin cancer and phenotypic characteristics (M-SKIP) project: study design and methods for pooling results of genetic epidemiological studies. BMC Med Res Methodol. 2012;12-116. 3. Stukel TA, Demidenko E, et al., Two-stage methods for the analysis of pooled data. Stat Med 2001;20(14):2115–2120. 4. Conway DI, et al., Enhancing epidemiologic research on head and neck cancer: INHANCE - The international head and neck cancer epidemiology consortium. Oral Oncol. 2009;45(9):743-6. 5. Edefonti V, et al., Natural vitamin C intake and the risk of head and neck cancer: A pooled analysis in the international head and neck cancer epidemiology consortium. Int J Cancer. 2015 Jul 15;137(2):448-62.
17-set-2015
consortia; pooled analysis; statistical methods; one-stage approach; two-stage approach
Settore MED/01 - Statistica Medica
Società Italiana di Statistica Medica ed Epidemiologia Clinica
http://www.congresso.sismec.info/metodi-biostatistici/
One-stage versus two-stage approaches in individual-level pooled data analysis / V. Edefonti, M. Ferraroni. ((Intervento presentato al 8. convegno Orizzonti 2020 per la Biostatistica e l’Epidemiologia Clinica: Sfide e Opportunità nell’Era dei Big Data tenutosi a Torino nel 2015.
Conference Object
File in questo prodotto:
File Dimensione Formato  
Sismec_2015_Edefonti_IRIS.pdf

accesso riservato

Descrizione: Articolo breve
Tipologia: Altro
Dimensione 101.68 kB
Formato Adobe PDF
101.68 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/338692
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact