The origin and evolution of Data Science (DS) have been a subject of ongoing debate, with perspectives varying across disciplines. Understanding the development of this field requires a data-driven approach that systematically analyzes the scientific literature and provides a practical method for its exploration. In this paper, we present the 'Atlas of Data Science Research' (DS-Atlas), an interactive visualization tool designed to study the landscape of the DS field. The DS-Atlas is built on a dataset of approximately 1.3 million scientific publications from the Elsevier Scopus database, leveraging Natural Language Processing, Large Language Models, and dimensionality reduction techniques to generate a semantic representation of the DS research. The DS-Atlas provides interactive operations to explore the dataset by allowing users to focus on specific areas, filter by keywords and/or time periods, and uncover thematic connections and research trends. Examples of concrete tasks that can be addressed by DS-Atlas are discussed to show how the proposed solution can support scholars in the data-driven analysis of the data science literature. As a further DS-Atlas contribution, the paper illustrates an analysis of the Data Science discipline in terms of geographical distribution of influential authors, institutions, and journal in the field. The DS-Atlas is publicly available online for exploration and testing.
The Atlas of Data Science Research / S. Picascia, S. Montanelli, S. Salini, S. Verzillo. - In: IEEE ACCESS. - ISSN 2169-3536. - 13:(2025 Oct 06), pp. 175943-175959. [10.1109/access.2025.3618442]
The Atlas of Data Science Research
S. PicasciaPrimo
;S. MontanelliSecondo
;S. SaliniPenultimo
;S. Verzillo
Ultimo
2025
Abstract
The origin and evolution of Data Science (DS) have been a subject of ongoing debate, with perspectives varying across disciplines. Understanding the development of this field requires a data-driven approach that systematically analyzes the scientific literature and provides a practical method for its exploration. In this paper, we present the 'Atlas of Data Science Research' (DS-Atlas), an interactive visualization tool designed to study the landscape of the DS field. The DS-Atlas is built on a dataset of approximately 1.3 million scientific publications from the Elsevier Scopus database, leveraging Natural Language Processing, Large Language Models, and dimensionality reduction techniques to generate a semantic representation of the DS research. The DS-Atlas provides interactive operations to explore the dataset by allowing users to focus on specific areas, filter by keywords and/or time periods, and uncover thematic connections and research trends. Examples of concrete tasks that can be addressed by DS-Atlas are discussed to show how the proposed solution can support scholars in the data-driven analysis of the data science literature. As a further DS-Atlas contribution, the paper illustrates an analysis of the Data Science discipline in terms of geographical distribution of influential authors, institutions, and journal in the field. The DS-Atlas is publicly available online for exploration and testing.| File | Dimensione | Formato | |
|---|---|---|---|
|
The_Atlas_of_Data_Science_Research.pdf
accesso aperto
Tipologia:
Publisher's version/PDF
Licenza:
Creative commons
Dimensione
6.39 MB
Formato
Adobe PDF
|
6.39 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.




