In this paper we propose a new method to deal with missingness in categorical data. The new proposal is a forward imputation procedure and is presented in the context of the Nonlinear Principal Component Analysis, used to obtain indicators from a large dataset. However, this procedure can be easily adopted in other contexts, and when other multivariate techniques are used. We discuss the statistical features of our imputation technique in connection with other treatment methods which are popular among Nonlinear Principal Component Analysis users. The performance of our method is then compared to the other methods through a simulation study which involves the application to a real dataset extracted from the Euro-barometer survey. Missing data are created in the original data matrix and then the comparison is performed in terms of how close the Nonlinear Principal Component Analysis outcomes from missing data treatment methods are to the ones obtained from the original data. The new procedure is seen to provide better results than the other methods under the different conditions considered.

Handling Missing Data in Presence of Categorical Variables: a New Imputation Procedure / P.A. Ferrari, A. Barbiero, G. Manzi - In: New Perspectives in Statistical Modeling and Data Analysis - Studies in Classification, Data Analysis, and Knowledge Organization / [a cura di] S. Ingrassia, R. Rocci, M. Vichi. - [s.l] : Springer, 2011. - ISBN 978-3-642-11362-8. - pp. 473-480 [10.1007/978-3-642-11363-5_53]

Handling Missing Data in Presence of Categorical Variables: a New Imputation Procedure

P.A. Ferrari
Primo
;
A. Barbiero
Secondo
;
G. Manzi
Ultimo
2011

Abstract

In this paper we propose a new method to deal with missingness in categorical data. The new proposal is a forward imputation procedure and is presented in the context of the Nonlinear Principal Component Analysis, used to obtain indicators from a large dataset. However, this procedure can be easily adopted in other contexts, and when other multivariate techniques are used. We discuss the statistical features of our imputation technique in connection with other treatment methods which are popular among Nonlinear Principal Component Analysis users. The performance of our method is then compared to the other methods through a simulation study which involves the application to a real dataset extracted from the Euro-barometer survey. Missing data are created in the original data matrix and then the comparison is performed in terms of how close the Nonlinear Principal Component Analysis outcomes from missing data treatment methods are to the ones obtained from the original data. The new procedure is seen to provide better results than the other methods under the different conditions considered.
Settore SECS-S/01 - Statistica
2011
Book Part (author)
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/169132
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact