IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

Big Data are generally huge quantities of digital information accrued automatically and/or merged from several sources and rarely result from properly planned population surveys. A Big Dataset is herein conceived as a collection of information concerning a finite population. Since the analysis of an entire Big Dataset can require enormous computational effort, we suggest selecting a sample of observations and using this sampling information to achieve the inferential goal. Instead of the design-based survey sampling approach (which relates to the estimation of summary finite population measures, such as means, totals, proportions) we consider the model-based sampling approach, which involves inference about parameters of a super-population model. This model is assumed to have generated the finite population values, i.e. the Big Dataset. Given a super-population model we can apply the theory of optimal design to draw a sample from the Big Dataset which contains the majority of information about the unknown parameters of interest. In addition, since a Big Dataset might provide poor information despite its size, from the definition of efficiency of a design we suggest a device to measure the quality of the Big Data.

Optimal Design of Experiments and Model-Based Survey Sampling in Big Data / L. Deldossi, C. Tommasi - In: Annual ENBIS Conference / [a cura di] J. Bischoff. - Budapest : Mathematical Institute of Eotvos Lorand University, Budapest, 2019. - ISBN 9789634891468. - pp. 37-37 (( Intervento presentato al 19. convegno Annual ENBIS Conference tenutosi a Budapest nel 2019.

Optimal Design of Experiments and Model-Based Survey Sampling in Big Data

L. Deldossi;C. Tommasi

2019

Abstract

Big Data are generally huge quantities of digital information accrued automatically and/or merged from several sources and rarely result from properly planned population surveys. A Big Dataset is herein conceived as a collection of information concerning a finite population. Since the analysis of an entire Big Dataset can require enormous computational effort, we suggest selecting a sample of observations and using this sampling information to achieve the inferential goal. Instead of the design-based survey sampling approach (which relates to the estimation of summary finite population measures, such as means, totals, proportions) we consider the model-based sampling approach, which involves inference about parameters of a super-population model. This model is assumed to have generated the finite population values, i.e. the Big Dataset. Given a super-population model we can apply the theory of optimal design to draw a sample from the Big Dataset which contains the majority of information about the unknown parameters of interest. In addition, since a Big Dataset might provide poor information despite its size, from the definition of efficiency of a design we suggest a device to measure the quality of the Big Data.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
				Finite-population sampling; Optimal design theory; Super-populatioin model; Tall data
			
	Settori scientifico-disciplinari del contributo (sola visualizzazione)
	
				Settore SECS-S/01 - Statistica
			
	Data di pubblicazione
	
				2019
			
	Tipologia
	
				Book Part (author)
			
	Appare nelle tipologie:
	
				03 - Contributo in volume

File in questo prodotto:

File	Dimensione	Formato
11314_0675125314.pdf accesso riservato Tipologia: Publisher's version/PDF Dimensione 7.57 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	7.57 MB	Adobe PDF	Visualizza/Apri Richiedi una copia
slides_Enbis_2019.pdf accesso aperto Descrizione: Slide presentate al convegno ENBIS 2019 Tipologia: Altro Dimensione 1.17 MB Formato Adobe PDF Visualizza/Apri	1.17 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/697601

Citazioni

ND

ND

ND

ND

social impact