In the last decade we have assisted to the great ICT development whose main effects have been translated into an increasing data collection for administrative agencies and a considerable improvement of their quality. On one hand administrative data are directly available, inexpensive and typically encompass large populations. On the other hand this type of data presents some problems which regard accuracy and completeness since they are collected for administrative aims. In order to study such complex and high-dimensional data-sets, whose size defies simplistic analysis, many statistical and computational tools have been developed. As well known in statistical literature a big quantity of statistical units can lead to biased significance effects. We suggest an innovative statistical method to handle large administrative data-sets. It is based on size reduction obtained through a specific sampling procedure. In order to validateour method, we compare the statistical analysis of the original dataset to the analysis of the sampled one. The data at our disposal are provided by Invalsi (National Committee for the Evaluation of the Italian Education Systems). This dataset is very innovative since it contains information about students characteristics and performances in Maths in all Lombardy region lower-secondary schools. The illustrative application proposes to investigate the existing relationships between the Maths scores and both individual and school factors. Given the hierarchical structure of data, a multilevel model has been built.

The significance effects problem for administrative data: a novel statistical approach / E. Raffinetti, I. Romeo. ((Intervento presentato al 2. convegno STMDA 2012 : stochastic modeling techniques and data analysis tenutosi a Chania nel 2012.

The significance effects problem for administrative data: a novel statistical approach

E. Raffinetti
Primo
;
2012

Abstract

In the last decade we have assisted to the great ICT development whose main effects have been translated into an increasing data collection for administrative agencies and a considerable improvement of their quality. On one hand administrative data are directly available, inexpensive and typically encompass large populations. On the other hand this type of data presents some problems which regard accuracy and completeness since they are collected for administrative aims. In order to study such complex and high-dimensional data-sets, whose size defies simplistic analysis, many statistical and computational tools have been developed. As well known in statistical literature a big quantity of statistical units can lead to biased significance effects. We suggest an innovative statistical method to handle large administrative data-sets. It is based on size reduction obtained through a specific sampling procedure. In order to validateour method, we compare the statistical analysis of the original dataset to the analysis of the sampled one. The data at our disposal are provided by Invalsi (National Committee for the Evaluation of the Italian Education Systems). This dataset is very innovative since it contains information about students characteristics and performances in Maths in all Lombardy region lower-secondary schools. The illustrative application proposes to investigate the existing relationships between the Maths scores and both individual and school factors. Given the hierarchical structure of data, a multilevel model has been built.
2012
Administrative data; Large data-sets; Sampling; Significance; Multilevel modeling
Settore SECS-S/01 - Statistica
The significance effects problem for administrative data: a novel statistical approach / E. Raffinetti, I. Romeo. ((Intervento presentato al 2. convegno STMDA 2012 : stochastic modeling techniques and data analysis tenutosi a Chania nel 2012.
Conference Object
File in questo prodotto:
File Dimensione Formato  
SMTDA_2012.pdf

accesso riservato

Tipologia: Publisher's version/PDF
Dimensione 158.87 kB
Formato Adobe PDF
158.87 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/219299
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact