Background: Because of the complexity of diet and the potential interactions between dietary components, the use of dietary patterns has been proposed, to describe variations in overall dietary intakes in a specific population and to analyze the relationship between diet and cancer risk. In the present work, factor analysis and cluster analysis were used in combination to identify groups of subjects with similar dietary patterns. Patients and methods: We analyzed data from an Italian case–control study, including 304 cases with squamous cell carcinoma of the esophagus and 743 hospital controls. Dietary habits were evaluated using a food frequency questionnaire. A posteriori dietary patterns were identified through principal component factor analysis performed on 28 selected nutrients. A varimax rotation was applied to achieve a simpler loading structure. Nutrients with absolute rotated factor loading greater or equal to 0.63 on a given pattern were used to name the patterns. For each pattern, participants were grouped into categories according to quartile of factor scores among the control population, and the odds ratios (OR) and corresponding 95% confidence intervals (CI) were estimated using unconditional multiple logistic regression models accounting for potential confounding variables. Then, cluster analysis was performed on factor scores obtained from factor analysis. The main analysis was carried out using the k-means method with Euclidean distance. The initial seeds were obtained performing preliminarily a hierarchical method (Ward’s) and cutting the resulting dendrogram at the level corresponding to 6 clusters. Results from the main analysis were compared with those from other clustering solutions identified using the k-means method with Manhattan, Lagrange and Correlation coefficient similarity measure distances and the Partitioning around Medoids method, with both Euclidean and Manhattan distances. The identified clusters were characterized by examining the distribution of several sociodemographic and lifestyle variables, and the average consumption of selected nutrients and food groups, within cluster. The ORs were estimated for each of the identified clusters, and corresponding 95% CIs were obtained referring to the floating absolute risks method. Results: PCFA allowed to identify five major dietary patterns, which explained about 80% of the total variance in the original nutrients. The Animal products and related components pattern (with high factor loadings on calcium, phosphorus, riboflavin, animal protein, saturated fatty acids, cholesterol, and zinc) was positively related to esophageal cancer risk (OR=1.64, 95% CI: 1.06-2.55). The Vitamins and fiber (with high loadings on vitamin C, total fiber, beta-carotene equivalents, soluble carbohydrates, and total folate) and the Other polyunsaturated fatty acids and vitamin D (with high loadings on other polyunsaturated fatty acids, vitamin D, and niacin) were inversely related to esophageal cancer (OR=0.50, 95% CI: 0.32-0.78, and OR=0.48, 95% CI: 0.31-0.74, respectively), while no relationship with this cancer was observed for the Starch-rich (starch, vegetable protein, and sodium) characterized by high loadings on (OR=0.80, 95% CI: 0.50-1.28) and the Other fats (with high loadings on linoleic acid, linolenic acid, and vitamin E) patterns (OR=1.04, 95% CI: 0.67-1.63). The naming of the factors, based on high factor scores characterizing each pattern, was confirmed by the distributions of selected nutrients and food groups. The subsequent cluster analysis, based on differences in the dietary patterns, yielded 6 clusters, one of which (C3) was characterized by the lowest intakes of all nutrients and food groups considered, while the remaining clusters were determined by an extreme value of the dietary patterns, one-by-one. Subjects in the C1 cluster were characterized by the highest values of the Vitamins and fiber pattern, subjects in the C2 cluster had the highest values of the Other polyunsaturated fatty acids pattern, the C4 cluster was characterized by the highest scores of the Animal products and related components, subjects in the C5 cluster had the highest values of the Other fats pattern, the C6 cluster was characterized by the highest scores of the Starch-rich pattern and had the highest intakes of bread, and pasta and rice. Significant inverse relations were observed between the C1, C5 and C6 clusters (OR=0.59, 95% CI:0.40-0.88, OR=0.42, 95% CI:0.20-0.86, and OR=0.60, 95% CI: 0.42-0.86, respectively) – which were characterized by high values of the Vitamins and fiber, Other fats, and Starch-rich patterns, respectively – as compared to the C3 cluster. No significant risk was observed for the C2, and C4 clusters (OR=0.76, 95% CI: 0.51-1.13, and OR=1.29, 95% CI: 0.80-2.07). Conclusion: The combined application of factor and cluster analyses, allows to identify key dietary aspects in a specific population, and to obtain mutually exclusive groups of subjects who are similar for these characteristics. The two techniques have limitations that arise from the subjective decisions involved in the analyses. In this application, various alternative options were tried, to check robustness and solution stability. Among these complementary analyses, results from PCFA were compared with those from another principal axis factoring, and those from PCFA analyses performed separately in strata of center and gender, and in randomly generated split samples. Moreover, the internal consistency of the identified patterns was evaluated using the Cronbach’s coefficient alphas. All these checks supported the decisions adopted in the main analyses. As concern cluster analysis, to limit the influence of the starting point, the initial seeds used in the k-means method were obtained performing a hierarchical clustering (Ward’s method) and cutting the corresponding dendrogram at the level k=6. Moreover, some alternative solutions were identified through different methods and distances, yielding comparable clustering solutions. Another limitation of cluster analysis is its sensitivity to the presence of outliers; however, the exclusion of 8 potential outliers did not materially change the results.

DIETARY PATTERNS AND ESOPHAGEAL CANCER: A POSTERIORI DIETARY PATTERNS IDENTIFIED THROUGH FACTOR ANALYSIS AND CLUSTER ANALYSIS / F. Bravi ; tutor: A. Decarli ; coordinatore: A. Decarli. DIPARTIMENTO DI SCIENZE CLINICHE E DI COMUNITA', 2013 Jan 18. 25. ciclo, Anno Accademico 2012. [10.13130/bravi-francesca_phd2013-01-18].

DIETARY PATTERNS AND ESOPHAGEAL CANCER: A POSTERIORI DIETARY PATTERNS IDENTIFIED THROUGH FACTOR ANALYSIS AND CLUSTER ANALYSIS

F. Bravi
2013

Abstract

Background: Because of the complexity of diet and the potential interactions between dietary components, the use of dietary patterns has been proposed, to describe variations in overall dietary intakes in a specific population and to analyze the relationship between diet and cancer risk. In the present work, factor analysis and cluster analysis were used in combination to identify groups of subjects with similar dietary patterns. Patients and methods: We analyzed data from an Italian case–control study, including 304 cases with squamous cell carcinoma of the esophagus and 743 hospital controls. Dietary habits were evaluated using a food frequency questionnaire. A posteriori dietary patterns were identified through principal component factor analysis performed on 28 selected nutrients. A varimax rotation was applied to achieve a simpler loading structure. Nutrients with absolute rotated factor loading greater or equal to 0.63 on a given pattern were used to name the patterns. For each pattern, participants were grouped into categories according to quartile of factor scores among the control population, and the odds ratios (OR) and corresponding 95% confidence intervals (CI) were estimated using unconditional multiple logistic regression models accounting for potential confounding variables. Then, cluster analysis was performed on factor scores obtained from factor analysis. The main analysis was carried out using the k-means method with Euclidean distance. The initial seeds were obtained performing preliminarily a hierarchical method (Ward’s) and cutting the resulting dendrogram at the level corresponding to 6 clusters. Results from the main analysis were compared with those from other clustering solutions identified using the k-means method with Manhattan, Lagrange and Correlation coefficient similarity measure distances and the Partitioning around Medoids method, with both Euclidean and Manhattan distances. The identified clusters were characterized by examining the distribution of several sociodemographic and lifestyle variables, and the average consumption of selected nutrients and food groups, within cluster. The ORs were estimated for each of the identified clusters, and corresponding 95% CIs were obtained referring to the floating absolute risks method. Results: PCFA allowed to identify five major dietary patterns, which explained about 80% of the total variance in the original nutrients. The Animal products and related components pattern (with high factor loadings on calcium, phosphorus, riboflavin, animal protein, saturated fatty acids, cholesterol, and zinc) was positively related to esophageal cancer risk (OR=1.64, 95% CI: 1.06-2.55). The Vitamins and fiber (with high loadings on vitamin C, total fiber, beta-carotene equivalents, soluble carbohydrates, and total folate) and the Other polyunsaturated fatty acids and vitamin D (with high loadings on other polyunsaturated fatty acids, vitamin D, and niacin) were inversely related to esophageal cancer (OR=0.50, 95% CI: 0.32-0.78, and OR=0.48, 95% CI: 0.31-0.74, respectively), while no relationship with this cancer was observed for the Starch-rich (starch, vegetable protein, and sodium) characterized by high loadings on (OR=0.80, 95% CI: 0.50-1.28) and the Other fats (with high loadings on linoleic acid, linolenic acid, and vitamin E) patterns (OR=1.04, 95% CI: 0.67-1.63). The naming of the factors, based on high factor scores characterizing each pattern, was confirmed by the distributions of selected nutrients and food groups. The subsequent cluster analysis, based on differences in the dietary patterns, yielded 6 clusters, one of which (C3) was characterized by the lowest intakes of all nutrients and food groups considered, while the remaining clusters were determined by an extreme value of the dietary patterns, one-by-one. Subjects in the C1 cluster were characterized by the highest values of the Vitamins and fiber pattern, subjects in the C2 cluster had the highest values of the Other polyunsaturated fatty acids pattern, the C4 cluster was characterized by the highest scores of the Animal products and related components, subjects in the C5 cluster had the highest values of the Other fats pattern, the C6 cluster was characterized by the highest scores of the Starch-rich pattern and had the highest intakes of bread, and pasta and rice. Significant inverse relations were observed between the C1, C5 and C6 clusters (OR=0.59, 95% CI:0.40-0.88, OR=0.42, 95% CI:0.20-0.86, and OR=0.60, 95% CI: 0.42-0.86, respectively) – which were characterized by high values of the Vitamins and fiber, Other fats, and Starch-rich patterns, respectively – as compared to the C3 cluster. No significant risk was observed for the C2, and C4 clusters (OR=0.76, 95% CI: 0.51-1.13, and OR=1.29, 95% CI: 0.80-2.07). Conclusion: The combined application of factor and cluster analyses, allows to identify key dietary aspects in a specific population, and to obtain mutually exclusive groups of subjects who are similar for these characteristics. The two techniques have limitations that arise from the subjective decisions involved in the analyses. In this application, various alternative options were tried, to check robustness and solution stability. Among these complementary analyses, results from PCFA were compared with those from another principal axis factoring, and those from PCFA analyses performed separately in strata of center and gender, and in randomly generated split samples. Moreover, the internal consistency of the identified patterns was evaluated using the Cronbach’s coefficient alphas. All these checks supported the decisions adopted in the main analyses. As concern cluster analysis, to limit the influence of the starting point, the initial seeds used in the k-means method were obtained performing a hierarchical clustering (Ward’s method) and cutting the corresponding dendrogram at the level k=6. Moreover, some alternative solutions were identified through different methods and distances, yielding comparable clustering solutions. Another limitation of cluster analysis is its sensitivity to the presence of outliers; however, the exclusion of 8 potential outliers did not materially change the results.
18-gen-2013
Settore MED/01 - Statistica Medica
cluster analysis ; diet ; dietary patterns ; esophageal cancer ; factor analysis ; nutrients
DECARLI, ADRIANO
DECARLI, ADRIANO
Doctoral Thesis
DIETARY PATTERNS AND ESOPHAGEAL CANCER: A POSTERIORI DIETARY PATTERNS IDENTIFIED THROUGH FACTOR ANALYSIS AND CLUSTER ANALYSIS / F. Bravi ; tutor: A. Decarli ; coordinatore: A. Decarli. DIPARTIMENTO DI SCIENZE CLINICHE E DI COMUNITA', 2013 Jan 18. 25. ciclo, Anno Accademico 2012. [10.13130/bravi-francesca_phd2013-01-18].
File in questo prodotto:
File Dimensione Formato  
phd_unim_R08573.pdf

Open Access dal 30/06/2014

Tipologia: Tesi di dottorato completa
Dimensione 1.48 MB
Formato Adobe PDF
1.48 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/215074
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact