In recent years, much research has been devoted to solve the problem of missing data imputation. Although most of the novel proposals look attractive for some reason, less attention has been paid to the problem of when and why a particular method should be chosen while discarding the others. This matter is far crucial in applications, given that unsuitable solutions could heavily affect the reliability of statistical analyses. Starting from this, this work is addressed to study how well several algorithmic-type imputation methods perform in the case of quantitative data. We focus on three different logics of imputing, based respectively on the use of random forests, iterative PCA, and the forward procedure. In particular, the latter, having initially been introduced for ordinal data, has required us to develop an original adaptation so that it handles missing quantitative values

Algorithmic imputation techniques for missing data : performance comparisons and development perspectives / N. Solaro, A. Barbiero, G. Manzi, P.A. Ferrari - In: Analysis and modeling of complex data in behavioural and social sciences : book of short paper / [a cura di] A. Okada, D. Vicari, G. Ragozini. - Padova : CLEUP, 2012 Aug. - ISBN 978-88-6129-916-0. (( Intervento presentato al 12. convegno JCS-CLADAG 12 tenutosi a Anacapri nel 2012.

Algorithmic imputation techniques for missing data : performance comparisons and development perspectives

A. Barbiero
Secondo
;
G. Manzi
Penultimo
;
P.A. Ferrari
Ultimo
2012

Abstract

In recent years, much research has been devoted to solve the problem of missing data imputation. Although most of the novel proposals look attractive for some reason, less attention has been paid to the problem of when and why a particular method should be chosen while discarding the others. This matter is far crucial in applications, given that unsuitable solutions could heavily affect the reliability of statistical analyses. Starting from this, this work is addressed to study how well several algorithmic-type imputation methods perform in the case of quantitative data. We focus on three different logics of imputing, based respectively on the use of random forests, iterative PCA, and the forward procedure. In particular, the latter, having initially been introduced for ordinal data, has required us to develop an original adaptation so that it handles missing quantitative values
Multivariate exponential power distribution ; Multivariate skew-normal distribution ; Nearest neighbour ; Principal component Analysis ; Random forest
Settore SECS-S/01 - Statistica
ago-2012
Società Italiana di Statistica
http://www.jcs-cladag12.tk/
Book Part (author)
File in questo prodotto:
File Dimensione Formato  
Solaro_et_al-3.pdf

accesso aperto

Tipologia: Publisher's version/PDF
Dimensione 38.14 kB
Formato Adobe PDF
38.14 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/205383
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact