Principal component analysis (PCA) is a well-established multivariate statistical technique, popular for analysing multivariate geochemical data. PCA uses eigenvector analysis to redistribute the total variance in a multivariate dataset with the aim of condensing a large portion of that total variance into a smaller number of independent variables (the principal components) that are linear combinat-ions of the original variables and reflect the effect of various phenomena affecting the dataset. PCA may aid in the recognition of geochemical patterns that can be related back to geological processes. Geochemical datasets commonly contain censored variables where concentrations fall above or below the detection limits; this data is often referred to as “less thans” or “greater thans”. Substitution is the most popular method for dealing with censored data, involving replacement of censored data with a single arbitrary value (usually 1/2 or 1/3 of the lower level of detection). Numerous studies have shown that substitution is inferior to other replacement methods as it produces a low range signal, which may not exist, and may obscure other signals, which do exist. This significantly biases the mean and results in misleading regression models. With regards to PCA, substitution may result in pseudo-correlation between variables thus leading to erroneous geological interpretations. Albeit greater than 25 years since substitution methods were known to be wrong, this practice for dealing with censored data is common place in research and referred by industry as the “industry standard“. Multivariate analysis requires concentrations for a single variable to be greater than 0 for every given sample. Thus, censored data cannot simply be deleted or substituted for 0 as other elemental relationships for that sample will be lost. The current study looks at the imputation of left-censored Au, Cu, Pb and Zn data from the Kulumadau epithermal Au deposit, Woodlark Island Papua New Guinea, using standardised probability distribution models. Lower levels of detection were defined using Q–Q and probability plots. Numerous probability distributions were fitted to each element. Goodness of fit tests were applied to evaluate which distribution models best approximate each variable. Once determined, the estimated parameters for the most appropriate probability distribution were used to create a synthetic dataset. Data below the defined lower level of detection from the synthetic dataset were randomly sampled and used to impute the left-censored data. PCA was then carried-out and the results compared for grade data utilising substituted values and the various realisations obtained for the distribution-based imputed values. Prior to PCA, datasets were transformed using centred log-ratios to circumvent problems associated with closure. Results of the PCA utilising distribution-based imputation yielded geological interpretations more consistent with field observations and petrographic studies, than that of the substituted dataset.

Imputation of left censored grade data of the Kulumadau epithermal gold deposit and implications for subsequent multivariate analysis / D. Burkett, B.F.J. Kelly, A. Comunian, I. Graham, D. Cohen. ((Intervento presentato al 22. convegno Australian Earth Sciences Convention tenutosi a Newcastle nel 2014.

Imputation of left censored grade data of the Kulumadau epithermal gold deposit and implications for subsequent multivariate analysis

A. Comunian;
2014

Abstract

Principal component analysis (PCA) is a well-established multivariate statistical technique, popular for analysing multivariate geochemical data. PCA uses eigenvector analysis to redistribute the total variance in a multivariate dataset with the aim of condensing a large portion of that total variance into a smaller number of independent variables (the principal components) that are linear combinat-ions of the original variables and reflect the effect of various phenomena affecting the dataset. PCA may aid in the recognition of geochemical patterns that can be related back to geological processes. Geochemical datasets commonly contain censored variables where concentrations fall above or below the detection limits; this data is often referred to as “less thans” or “greater thans”. Substitution is the most popular method for dealing with censored data, involving replacement of censored data with a single arbitrary value (usually 1/2 or 1/3 of the lower level of detection). Numerous studies have shown that substitution is inferior to other replacement methods as it produces a low range signal, which may not exist, and may obscure other signals, which do exist. This significantly biases the mean and results in misleading regression models. With regards to PCA, substitution may result in pseudo-correlation between variables thus leading to erroneous geological interpretations. Albeit greater than 25 years since substitution methods were known to be wrong, this practice for dealing with censored data is common place in research and referred by industry as the “industry standard“. Multivariate analysis requires concentrations for a single variable to be greater than 0 for every given sample. Thus, censored data cannot simply be deleted or substituted for 0 as other elemental relationships for that sample will be lost. The current study looks at the imputation of left-censored Au, Cu, Pb and Zn data from the Kulumadau epithermal Au deposit, Woodlark Island Papua New Guinea, using standardised probability distribution models. Lower levels of detection were defined using Q–Q and probability plots. Numerous probability distributions were fitted to each element. Goodness of fit tests were applied to evaluate which distribution models best approximate each variable. Once determined, the estimated parameters for the most appropriate probability distribution were used to create a synthetic dataset. Data below the defined lower level of detection from the synthetic dataset were randomly sampled and used to impute the left-censored data. PCA was then carried-out and the results compared for grade data utilising substituted values and the various realisations obtained for the distribution-based imputed values. Prior to PCA, datasets were transformed using centred log-ratios to circumvent problems associated with closure. Results of the PCA utilising distribution-based imputation yielded geological interpretations more consistent with field observations and petrographic studies, than that of the substituted dataset.
lug-2014
Settore GEO/06 - Mineralogia
Imputation of left censored grade data of the Kulumadau epithermal gold deposit and implications for subsequent multivariate analysis / D. Burkett, B.F.J. Kelly, A. Comunian, I. Graham, D. Cohen. ((Intervento presentato al 22. convegno Australian Earth Sciences Convention tenutosi a Newcastle nel 2014.
Conference Object
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/387419
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact