Topic models are a well known clustering approach for textual data, which provides promising applications in the bibliometric context for the purpose of discovering scientific topics and trends in a corpus of scientific publications. However, topic models per se provide poorly descriptive metadata featuring the discovered clusters of publications and they are not related to the other important metadata usually available with publications, such as authors affiliation, publication venue, and publication year. In this paper, we propose a methodological approach to topic modeling and post-processing of topic models results to the end of describing in depth a field of research over time. In particular, we work on a selection of publications from the international statistical literature, we propose an approach that allows us to identify sophisticated topic descriptors, and we analyze the links between topics and their temporal evolution.

A decade of research in statistics: a topic model approach / F. De Battisti, A. Ferrara, S. Salini. - In: SCIENTOMETRICS. - ISSN 0138-9130. - 103:2(2015 May), pp. 413-433. [10.1007/s11192-015-1554-1]

A decade of research in statistics: a topic model approach

F. De Battisti
Primo
;
A. Ferrara
Secondo
;
S. Salini
Ultimo
2015

Abstract

Topic models are a well known clustering approach for textual data, which provides promising applications in the bibliometric context for the purpose of discovering scientific topics and trends in a corpus of scientific publications. However, topic models per se provide poorly descriptive metadata featuring the discovered clusters of publications and they are not related to the other important metadata usually available with publications, such as authors affiliation, publication venue, and publication year. In this paper, we propose a methodological approach to topic modeling and post-processing of topic models results to the end of describing in depth a field of research over time. In particular, we work on a selection of publications from the international statistical literature, we propose an approach that allows us to identify sophisticated topic descriptors, and we analyze the links between topics and their temporal evolution.
Clustering; Probabilistic topic models; Scientometrics; Text mining
Settore SECS-S/01 - Statistica
Settore INF/01 - Informatica
mag-2015
Article (author)
File in questo prodotto:
File Dimensione Formato  
Scientometrics Marzo 2015.pdf

accesso riservato

Tipologia: Publisher's version/PDF
Dimensione 1.31 MB
Formato Adobe PDF
1.31 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
manuscipt_revision_04122014.pdf

accesso riservato

Tipologia: Post-print, accepted manuscript ecc. (versione accettata dall'editore)
Dimensione 507.29 kB
Formato Adobe PDF
507.29 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
art%3A10.1007%2Fs11192-015-1554-1.pdf

accesso riservato

Tipologia: Publisher's version/PDF
Dimensione 952.44 kB
Formato Adobe PDF
952.44 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/270552
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 54
  • ???jsp.display-item.citation.isi??? 48
social impact