The origin and evolution of Data Science (DS) have been a subject of ongoing debate, with perspectives varying across disciplines. Understanding the development of this field requires a data-driven approach that systematically analyzes the scientific literature and provides a practical method for its exploration. In this paper, we present the 'Atlas of Data Science Research' (DS-Atlas), an interactive visualization tool designed to study the landscape of the DS field. The DS-Atlas is built on a dataset of approximately 1.3 million scientific publications from the Elsevier Scopus database, leveraging Natural Language Processing, Large Language Models, and dimensionality reduction techniques to generate a semantic representation of the DS research. The DS-Atlas provides interactive operations to explore the dataset by allowing users to focus on specific areas, filter by keywords and/or time periods, and uncover thematic connections and research trends. Examples of concrete tasks that can be addressed by DS-Atlas are discussed to show how the proposed solution can support scholars in the data-driven analysis of the data science literature. As a further DS-Atlas contribution, the paper illustrates an analysis of the Data Science discipline in terms of geographical distribution of influential authors, institutions, and journal in the field. The DS-Atlas is publicly available online for exploration and testing.

The Atlas of Data Science Research / S. Picascia, S. Montanelli, S. Salini, S. Verzillo. - In: IEEE ACCESS. - ISSN 2169-3536. - 13:(2025 Oct 06), pp. 175943-175959. [10.1109/access.2025.3618442]

The Atlas of Data Science Research

S. Picascia
Primo
;
S. Montanelli
Secondo
;
S. Salini
Penultimo
;
S. Verzillo
Ultimo
2025

Abstract

The origin and evolution of Data Science (DS) have been a subject of ongoing debate, with perspectives varying across disciplines. Understanding the development of this field requires a data-driven approach that systematically analyzes the scientific literature and provides a practical method for its exploration. In this paper, we present the 'Atlas of Data Science Research' (DS-Atlas), an interactive visualization tool designed to study the landscape of the DS field. The DS-Atlas is built on a dataset of approximately 1.3 million scientific publications from the Elsevier Scopus database, leveraging Natural Language Processing, Large Language Models, and dimensionality reduction techniques to generate a semantic representation of the DS research. The DS-Atlas provides interactive operations to explore the dataset by allowing users to focus on specific areas, filter by keywords and/or time periods, and uncover thematic connections and research trends. Examples of concrete tasks that can be addressed by DS-Atlas are discussed to show how the proposed solution can support scholars in the data-driven analysis of the data science literature. As a further DS-Atlas contribution, the paper illustrates an analysis of the Data Science discipline in terms of geographical distribution of influential authors, institutions, and journal in the field. The DS-Atlas is publicly available online for exploration and testing.
Data Science Atlas; Empirical Analysis; Natural Language Processing; Visual Data Exploration;
Settore INFO-01/A - Informatica
6-ott-2025
Article (author)
File in questo prodotto:
File Dimensione Formato  
The_Atlas_of_Data_Science_Research.pdf

accesso aperto

Tipologia: Publisher's version/PDF
Licenza: Creative commons
Dimensione 6.39 MB
Formato Adobe PDF
6.39 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1188777
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex 0
social impact