EXPLORATORY AND CONFIRMATORY FACTOR ANALYSIS TO IDENTIFY AND VALIDATE DIETARY PATTERNS: AN APPLICATION TO A CASE-CONTROL STUDY OF GASTRIC CANCER.

Bertuccio, P.

doi:10.13130/bertuccio-paola_phd2011-02-04

In nutritional epidemiology, the use of dietary pattern methods, based on foods or nutrients, has increased substantially over the past several years. Use of explorative statistical methods is one way to examine dietary patterns in populations. Of these, exploratory factor analysis (EFA) is a data aggregation procedure used to reduce dietary data into meaningful food or nutrient patterns based on inter-correlations between dietary items. The factors are then named, usually according to those foods or nutrients that most heavily contribute to the pattern, and the patterns can then be used as the primary exposure variables in dietary studies. Several studies have used explorative methods to identify dietary patterns in epidemiological studies, but few have validated the factors with confirmatory analyses. The purpose of my PhD thesis is to further knowledge of factor analysis methods in nutritional epidemiologic research. In particular, I studied the application of the conﬁrmatory factor analysis (CFA) to validate nutrient-dietary patterns derived from EFA. The major difference between these two variants of factor analyses is that: in EFA all nutrients load on all factors (a posteriori approach), while in CFA only the nutrients decided on a priori are included. One of the criteria used for the a priori decision, could be the magnitude of the nutrient’s loading in an EFA. CFA is a type of structural equation modeling that deals specifically with measurement models, that is, the relationships between measured variables and latent variables (i.e., a hypothetical construct that is not directly measured or observed in the study). Therefore, this statistical technique allows the researchers to test and verify a particular model or factor structure that they believe underlies the variables measured in the study. In this work, the measured variables are represented by the nutrients and the latent variables are represented by the dietary patterns derived from an EFA. The acceptability of the tested CFA models is usually evaluated by descriptive goodness-of-fit indices. Among these indices, comparative fit index (CFI), the non-normed fit indices (NNFI) are the most used. By convention, CFI and NNFI ≥0.90 indicate an acceptable fit. The fit of the model is also judged by the root mean square residual (RMR) and the root mean square error of approximation (RMSEA). By convention, RMR and RMSEA values close to 0 indicate a good fit. To assess the fit of a CFA model, the chi-square test was also used. This test has as null hypothesis that the model fits the data. However, with large samples and real-world data, the chi-square statistic is very frequently significant even if the model provides a good fit. For these reasons, the mentioned indices and the chi-square test must be considered together, and it is not frequent conclude that a CFA model fits the data even if the chi-square p–value is significant. In my PhD project, I applied EFA analyses to derive nutrient-dietary patterns, based on a set of 28 selected micro- and macro-nutrients, in the context of a case-control study of gastric cancer. To decide how many factors to extract, I carried out and compared different CFA models that tested structures from 2 to 6 latent factors derived from EFA, in which I included only those nutrients with explored factor loadings ≥0.63. In CFA models, the included nutrient items were allowed to load on only one factor, and loadings were fixed at zero for the other factors. Since the latent factors in CFA models were derived from orthogonal EFA solutions, I fixed to zero the factors’ covariance. Then, to improve the parsimony and interpretability of CFA solutions, I also tested revised models, i.e. factor covariances were freed to estimate the relationship between the latent dimensions, and different cut-off, other than 0.63, were also considered. Using the EFA, the cumulative percentages of variance explained by six-, five-, four-, three-, and two-factor solutions were approximately equal to 84%, 80%, 75%, 69% and 63%, respectively. I excluded from CFA models the six-factor solution, since it showed a pattern based only on a single nutrient. Throughout solutions from five- to two-factor, all confirmed factor loadings ranged from 0.5 to 1. The associated t tests (greater than 3.291 with p<0.001) indicated that the loading of each nutrient was significantly different from zero. The chi-square test gives p-values highly significant for each CFA model, that lead to reject that the models fit the data. However, because of the problems with this significance test, this findings by itself did not cause to reject the models. Throughout the different CFA models with factor covariance free to estimate, the RMR values were around the 0.1 threshold for an acceptable fit. The RMSEA values were somewhat higher than the threshold for an acceptable fit. Considering the CFA models with factor covariance fixed to zero, the CFI values increased with the number of retained factors, from 0.57 for the five-factor model to 0.76 for the two-factor model including nutrients with explored factor loadings ≥0.70. The CFI values for the CFA models with factor covariance free to estimate were higher compared to those with factor covariance fixed to zero, to reach 0.80 for the two-factor model including nutrients with loadings ≥0.70, quite close to the 0.90 threshold for an acceptable fit. The NFI values were very similar than those of the CFI, whereas the NNFI values were lower. In conclusion, results from all CFA models are not very satisfactory. For this reason, in order to better understand the performance of this statistical technique, I tested and compared results from CFA applied on simulated datasets characterized by a structure generated “ad hoc”, such as each variable was highly correlated only to one factor, for a total of four orthogonal factors. In this case, I verified that CFA technique provides satisfactory results, in particular when the sample size is at least 500, although limitations regarding some goodness-of-fit indices remain. Moreover, a different use of the CFA could be particularly useful. For example if the confirmed factors were tested in a different study as true a priori factors: the factors identiﬁed in one group could be applied in a different group using CFA based on the same nutrients to compute scores. Hence, the factor scores could be acceptable and robust as markers of nutrient intake pattern on group levels and may prove useful in studies of diet–disease relationships. Nevertheless, until factor analysis gains more experience in nutrition, it will be difficult to define valid criteria for a good fit in this discipline and methodologies for improving fit.

EXPLORATORY AND CONFIRMATORY FACTOR ANALYSIS TO IDENTIFY AND VALIDATE DIETARY PATTERNS: AN APPLICATION TO A CASE-CONTROL STUDY OF GASTRIC CANCER / P. Bertuccio ; tutor: Adriano Decarli , tutor: Carlo La Vecchia ; coordinatore: Silvano Milani. Universita' degli Studi di Milano, 2011 Feb 04. 23. ciclo, Anno Accademico 2010. [10.13130/bertuccio-paola_phd2011-02-04].