Big Data are generally huge quantities of digital information accrued automatically and/or merged from several sources and rarely result from properly planned population surveys. A Big Dataset is herein conceived as a collection of information concerning a finite population. Since the analysis of an entire Big Dataset can require enormous computational effort, we suggest selecting a sample of observations and using this sampling information to achieve the inferential goal. Instead of the design-based survey sampling approach (which relates to the estimation of summary finite population measures, such as means, totals, proportions) we consider the model-based sampling approach, which involves inference about parameters of a super-population model. This model is assumed to have generated the finite population values, i.e. the Big Dataset. Given a super-population model we can apply the theory of optimal design to draw a sample from the Big Dataset which contains the majority of information about the unknown parameters of interest. In addition, since a Big Dataset might provide poor information despite its size, from the definition of efficiency of a design we suggest a device to measure the quality of the Big Data.
Optimal Design of Experiments and Model-Based Survey Sampling in Big Data / L. Deldossi, C. Tommasi - In: Annual ENBIS Conference / [a cura di] J. Bischoff. - Budapest : Mathematical Institute of Eotvos Lorand University, Budapest, 2019. - ISBN 9789634891468. - pp. 37-37 (( Intervento presentato al 19. convegno Annual ENBIS Conference tenutosi a Budapest nel 2019.
Optimal Design of Experiments and Model-Based Survey Sampling in Big Data
C. Tommasi
2019
Abstract
Big Data are generally huge quantities of digital information accrued automatically and/or merged from several sources and rarely result from properly planned population surveys. A Big Dataset is herein conceived as a collection of information concerning a finite population. Since the analysis of an entire Big Dataset can require enormous computational effort, we suggest selecting a sample of observations and using this sampling information to achieve the inferential goal. Instead of the design-based survey sampling approach (which relates to the estimation of summary finite population measures, such as means, totals, proportions) we consider the model-based sampling approach, which involves inference about parameters of a super-population model. This model is assumed to have generated the finite population values, i.e. the Big Dataset. Given a super-population model we can apply the theory of optimal design to draw a sample from the Big Dataset which contains the majority of information about the unknown parameters of interest. In addition, since a Big Dataset might provide poor information despite its size, from the definition of efficiency of a design we suggest a device to measure the quality of the Big Data.File | Dimensione | Formato | |
---|---|---|---|
11314_0675125314.pdf
accesso riservato
Tipologia:
Publisher's version/PDF
Dimensione
7.57 MB
Formato
Adobe PDF
|
7.57 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
slides_Enbis_2019.pdf
accesso aperto
Descrizione: Slide presentate al convegno ENBIS 2019
Tipologia:
Altro
Dimensione
1.17 MB
Formato
Adobe PDF
|
1.17 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.