The proliferation of unstructured, multimodal data presents a significant challenge for effective knowledge extraction, due to the heterogeneous nature and the complexity of extracting meaningful patterns in environments presenting diverse data types. This thesis proposes SHIFT, the first seed-guided hierarchical topic modelling framework specifically designed for heterogeneous data environments. It combines unsupervised information extraction with advanced representation learning techniques, incorporating external knowledge bases to enhance semantic understanding. SHIFT modular architecture provides seamless adaptation to different data modalities and domain requirements. The framework’s adaptability is demonstrated through comprehensive applications across distinct domains, from legal text analysis, through scientific literature understanding, to digital humanities research. Beyond the core SHIFT framework, this thesis presents the development and application of complementary approaches tailored to domain-specific challenges. Through extensive evaluation, the research validates the technical effectiveness and practical utility of these frameworks for real-world knowledge extraction challenges. This work contributes to advances in multimodal topic modelling while demonstrating the critical importance of adaptive, modular approaches for handling the complexity and diversity of contemporary unstructured data across multiple academic and professional contexts.

ADAPTIVE FRAMEWORKS FOR KNOWLEDGE EXTRACTION IN HETEROGENEOUS DATA ENVIRONMENTS / S. Picascia ; tutor: A. Ferrara ; co-tutor: S. Montanelli ; coordinatore: R. Sassi. Dipartimento di Studi Letterari, Filologici e Linguistici, 2025 Dec 19. 38. ciclo, Anno Accademico 2024/2025.

ADAPTIVE FRAMEWORKS FOR KNOWLEDGE EXTRACTION IN HETEROGENEOUS DATA ENVIRONMENTS

S. Picascia
2025

Abstract

The proliferation of unstructured, multimodal data presents a significant challenge for effective knowledge extraction, due to the heterogeneous nature and the complexity of extracting meaningful patterns in environments presenting diverse data types. This thesis proposes SHIFT, the first seed-guided hierarchical topic modelling framework specifically designed for heterogeneous data environments. It combines unsupervised information extraction with advanced representation learning techniques, incorporating external knowledge bases to enhance semantic understanding. SHIFT modular architecture provides seamless adaptation to different data modalities and domain requirements. The framework’s adaptability is demonstrated through comprehensive applications across distinct domains, from legal text analysis, through scientific literature understanding, to digital humanities research. Beyond the core SHIFT framework, this thesis presents the development and application of complementary approaches tailored to domain-specific challenges. Through extensive evaluation, the research validates the technical effectiveness and practical utility of these frameworks for real-world knowledge extraction challenges. This work contributes to advances in multimodal topic modelling while demonstrating the critical importance of adaptive, modular approaches for handling the complexity and diversity of contemporary unstructured data across multiple academic and professional contexts.
19-dic-2025
Settore INFO-01/A - Informatica
FERRARA, ALFIO
SASSI, ROBERTO
Doctoral Thesis
ADAPTIVE FRAMEWORKS FOR KNOWLEDGE EXTRACTION IN HETEROGENEOUS DATA ENVIRONMENTS / S. Picascia ; tutor: A. Ferrara ; co-tutor: S. Montanelli ; coordinatore: R. Sassi. Dipartimento di Studi Letterari, Filologici e Linguistici, 2025 Dec 19. 38. ciclo, Anno Accademico 2024/2025.
File in questo prodotto:
File Dimensione Formato  
phd_unimi_R13753.pdf

accesso aperto

Tipologia: Publisher's version/PDF
Licenza: Creative commons
Dimensione 6.08 MB
Formato Adobe PDF
6.08 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1202718
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact