The Nearest Neighbour Imputation (NNI) method has a long history in missing data imputation. Likewise, multivariate dimensional reduction techniques allow for preserving the maximum information from the data. Recently, the combined use of these methodologies has been proposed to solve data imputation problems and exploit as much as information from the complete part of the data. In this paper we perform an extensive simulation study to test the performance of this new imputation approach (called “Forward Imputation” - ForImp). We compare the two ForImp methods developed for missing quantitative data (the first one called ForImpPCA involving the NNI method and the Principal Component Analysis (PCA) as a multivariate data analysis technique, and the second one called ForImpMahalanobis, which involves the Mahalanobis distance for NNI) with other two imputation techniques regarded as benchmark, namely Stekhoven and Bühlmann’s missForest method, which is a nonparametric imputation technique for continuous and/or categorical data based on a random forest, and the Iterative PCA, which is an algorithmic-type technique that imputes missing values simultaneously by an iterative use of PCA. The simulation study is based on constructing simulated data with different levels of kurtosis or skewness and strength of linear relationship of variables, so that the performance of the four methods can be compared on various data patterns. Distributions used for these simulated data belong to the families of Multivariate Exponential Power and Multivariate Skew-Normal distributions, respectively. Results tend to favour ForImpMahalanobis especially in the presence of skew data with small or negative correlations of a same magnitude, or a mix of negative and positive correlations of low level, whereas ForImpPCA works better than it when a slightly higher level of correlations is present in the data.

A Comprehensive Simulation Study on the Forward Imputation / N. Solaro, A. Barbiero, G. Manzi, P.A. Ferrari. - [s.l] : Dipartimento di Economia, Management e Metodi Quantitativi, Università degli Studi di Milano, 2015 Feb. (WORKING PAPER SERIES / DIPARTIMENTO DI ECONOMIA POLITICA E AZIENDALE, UNIVERSITÀ DEGLI STUDI DI MILANO)

A Comprehensive Simulation Study on the Forward Imputation

A. Barbiero
Secondo
;
G. Manzi
Penultimo
;
P.A. Ferrari
Ultimo
2015

Abstract

The Nearest Neighbour Imputation (NNI) method has a long history in missing data imputation. Likewise, multivariate dimensional reduction techniques allow for preserving the maximum information from the data. Recently, the combined use of these methodologies has been proposed to solve data imputation problems and exploit as much as information from the complete part of the data. In this paper we perform an extensive simulation study to test the performance of this new imputation approach (called “Forward Imputation” - ForImp). We compare the two ForImp methods developed for missing quantitative data (the first one called ForImpPCA involving the NNI method and the Principal Component Analysis (PCA) as a multivariate data analysis technique, and the second one called ForImpMahalanobis, which involves the Mahalanobis distance for NNI) with other two imputation techniques regarded as benchmark, namely Stekhoven and Bühlmann’s missForest method, which is a nonparametric imputation technique for continuous and/or categorical data based on a random forest, and the Iterative PCA, which is an algorithmic-type technique that imputes missing values simultaneously by an iterative use of PCA. The simulation study is based on constructing simulated data with different levels of kurtosis or skewness and strength of linear relationship of variables, so that the performance of the four methods can be compared on various data patterns. Distributions used for these simulated data belong to the families of Multivariate Exponential Power and Multivariate Skew-Normal distributions, respectively. Results tend to favour ForImpMahalanobis especially in the presence of skew data with small or negative correlations of a same magnitude, or a mix of negative and positive correlations of low level, whereas ForImpPCA works better than it when a slightly higher level of correlations is present in the data.
feb-2015
correlation; data patterns; Kurtosis; Mahalanobis distance; MissForest; nearest neighbour imputation; principal component analysis; skewness
Settore SECS-S/01 - Statistica
http://econpapers.repec.org/paper/milwpdepa/2015-04.htm
Working Paper
A Comprehensive Simulation Study on the Forward Imputation / N. Solaro, A. Barbiero, G. Manzi, P.A. Ferrari. - [s.l] : Dipartimento di Economia, Management e Metodi Quantitativi, Università degli Studi di Milano, 2015 Feb. (WORKING PAPER SERIES / DIPARTIMENTO DI ECONOMIA POLITICA E AZIENDALE, UNIVERSITÀ DEGLI STUDI DI MILANO)
File in questo prodotto:
File Dimensione Formato  
Solaro et al 2015.pdf

accesso aperto

Tipologia: Altro
Dimensione 982.67 kB
Formato Adobe PDF
982.67 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/515299
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact