In the last decade we have assisted to the great ICT development whose main effects have been translated into an increasing data collection for administrative agencies and a considerable improvement of their quality. On one hand administrative data are directly available, inexpensive and typically encompass large populations. On the other hand this type of data presents some problems which regard accuracy and completeness since they are collected for administrative aims. In order to study such complex and high-dimensional data-sets, whose size defies simplistic analysis, many statistical and computational tools have been developed. As well known in statistical literature a big quantity of statistical units can lead to biased significance effects. We suggest an innovative statistical method to handle large administrative data-sets. It is based on size reduction obtained through a specific sampling procedure. In order to validateour method, we compare the statistical analysis of the original dataset to the analysis of the sampled one. The data at our disposal are provided by Invalsi (National Committee for the Evaluation of the Italian Education Systems). This dataset is very innovative since it contains information about students characteristics and performances in Maths in all Lombardy region lower-secondary schools. The illustrative application proposes to investigate the existing relationships between the Maths scores and both individual and school factors. Given the hierarchical structure of data, a multilevel model has been built.
The significance effects problem for administrative data: a novel statistical approach / E. Raffinetti, I. Romeo. ((Intervento presentato al 2. convegno STMDA 2012 : stochastic modeling techniques and data analysis tenutosi a Chania nel 2012.
The significance effects problem for administrative data: a novel statistical approach
E. RaffinettiPrimo
;
2012
Abstract
In the last decade we have assisted to the great ICT development whose main effects have been translated into an increasing data collection for administrative agencies and a considerable improvement of their quality. On one hand administrative data are directly available, inexpensive and typically encompass large populations. On the other hand this type of data presents some problems which regard accuracy and completeness since they are collected for administrative aims. In order to study such complex and high-dimensional data-sets, whose size defies simplistic analysis, many statistical and computational tools have been developed. As well known in statistical literature a big quantity of statistical units can lead to biased significance effects. We suggest an innovative statistical method to handle large administrative data-sets. It is based on size reduction obtained through a specific sampling procedure. In order to validateour method, we compare the statistical analysis of the original dataset to the analysis of the sampled one. The data at our disposal are provided by Invalsi (National Committee for the Evaluation of the Italian Education Systems). This dataset is very innovative since it contains information about students characteristics and performances in Maths in all Lombardy region lower-secondary schools. The illustrative application proposes to investigate the existing relationships between the Maths scores and both individual and school factors. Given the hierarchical structure of data, a multilevel model has been built.| File | Dimensione | Formato | |
|---|---|---|---|
|
SMTDA_2012.pdf
accesso riservato
Tipologia:
Publisher's version/PDF
Dimensione
158.87 kB
Formato
Adobe PDF
|
158.87 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.




