The so called “Big Data” are data which we think as being “big” because of their volume, their amount per unit of time and because they are un- structured. The usual sources of big data are administrative repositories, transaction data or social media and social network feeds. Someone defines big data as those data which cannot be analyzed on a desktop machine or stored on one’s hard disk. These ways of defining big data completely miss the point of view of Statistics: they seem to be tailored more to advertising campaign of SaS or storage solution rather than to Science. Moreover, recent big fails, like e.g. the famous/infamous Google Flu Trend experiment, raised a series of popular news paper articles against the validity of information contained in these data and Statistics itself, even though none of these bad practices has been conducted by statisticians. While Information Technology and Computer Science are good at efficiently retrive and manage them, these data should be soon brought back into the field of Statistics to where data belong and this Special Issues of EJASA is one important step in this direction.

Big data or big fail? The good, the bad and the ugly and the missing role of statistics / S.M. Iacus. - In: ELECTRONIC JOURNAL OF APPLIED STATISTICAL ANALYSIS: DECISION SUPPORT SYSTEMS AND SERVICES EVALUATION. - ISSN 2037-3627. - 5:11(2014 Dec 28), pp. 4-11. [10.1285/i2037-3627v5n1p4]

Big data or big fail? The good, the bad and the ugly and the missing role of statistics

S.M. Iacus
Primo
2014

Abstract

The so called “Big Data” are data which we think as being “big” because of their volume, their amount per unit of time and because they are un- structured. The usual sources of big data are administrative repositories, transaction data or social media and social network feeds. Someone defines big data as those data which cannot be analyzed on a desktop machine or stored on one’s hard disk. These ways of defining big data completely miss the point of view of Statistics: they seem to be tailored more to advertising campaign of SaS or storage solution rather than to Science. Moreover, recent big fails, like e.g. the famous/infamous Google Flu Trend experiment, raised a series of popular news paper articles against the validity of information contained in these data and Statistics itself, even though none of these bad practices has been conducted by statisticians. While Information Technology and Computer Science are good at efficiently retrive and manage them, these data should be soon brought back into the field of Statistics to where data belong and this Special Issues of EJASA is one important step in this direction.
big data; social media; unstructured data; statistics
Settore SECS-S/01 - Statistica
Settore MAT/06 - Probabilita' e Statistica Matematica
http://siba-ese.unile.it/index.php/ejasa_dss/article/view/14509
Article (author)
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/254508
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact