An extensive investigation via simulation is carried out with the aim of comparing three nonparametric, single imputation methods in the presence of multiple data patterns. The ultimate goal is to provide useful hints for users needing to quickly pick the most effective impu- tation method among the following: Forward Imputation (ForImp), considered in the two variants of ForImp with the principal compo- nent analysis (PCA), which alternates the use of PCA and the Nearest- Neighbour Imputation (NNI) method in a forward, sequential pro- cedure, and ForImp with the Mahalanobis distance, which involves the use of the Mahalanobis distance when performing NNI; the itera- tive PCA technique, which imputes missing values simultaneously via PCA; the missForest method, which is based on random forests and is developed for mixed-type data. The performance of these methods is compared under several data patterns characterized by different levels of kurtosis or skewness and correlation structures.

A simulation comparison of imputation methods for quantitative data in the presence of multiple data patterns / N. Solaro, A. Barbiero, G. Manzi, P.A. Ferrari. - In: JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION. - ISSN 0094-9655. - 88:18(2018 Dec 12), pp. 3588-3619.

A simulation comparison of imputation methods for quantitative data in the presence of multiple data patterns

A. Barbiero
Secondo
;
G. Manzi
Penultimo
;
P.A. Ferrari
Ultimo
2018

Abstract

An extensive investigation via simulation is carried out with the aim of comparing three nonparametric, single imputation methods in the presence of multiple data patterns. The ultimate goal is to provide useful hints for users needing to quickly pick the most effective impu- tation method among the following: Forward Imputation (ForImp), considered in the two variants of ForImp with the principal compo- nent analysis (PCA), which alternates the use of PCA and the Nearest- Neighbour Imputation (NNI) method in a forward, sequential pro- cedure, and ForImp with the Mahalanobis distance, which involves the use of the Mahalanobis distance when performing NNI; the itera- tive PCA technique, which imputes missing values simultaneously via PCA; the missForest method, which is based on random forests and is developed for mixed-type data. The performance of these methods is compared under several data patterns characterized by different levels of kurtosis or skewness and correlation structures.
forward imputation; iterative principal component analysis; Mahalanobis distance; missForest; missing data; Monte Carlo simulation; multivariate exponential power distribution; multivariate skew-normal distribution; nearest-neighbour imputation
Settore SECS-S/01 - Statistica
12-dic-2018
Article (author)
File in questo prodotto:
File Dimensione Formato  
A simulation comparison of imputation methods for quantitative data in the presence of multiple data patterns.pdf

accesso riservato

Tipologia: Publisher's version/PDF
Dimensione 3.03 MB
Formato Adobe PDF
3.03 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Solaro-Barbiero-Manzi-Ferrari_JSCS-2018.pdf

accesso aperto

Tipologia: Pre-print (manoscritto inviato all'editore)
Dimensione 578.82 kB
Formato Adobe PDF
578.82 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/594064
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 9
  • ???jsp.display-item.citation.isi??? 7
social impact