Identifying and Qualifying Deviant Cases in Clusters of Sequences: The Why and The How

Piccarreta, R.; Struffolino, E.

doi:10.1007/s10680-023-09682-3

Sequence analysis is employed in different fields—e.g., demography, sociology, and political sciences—to describe longitudinal processes represented as sequences of categorical states. In many applications, sequences are clustered to identify rel- evant types, which reflect the different empirical realisations of the temporal pro- cess under study. We explore criteria to inspect internal cluster composition and to detect deviant sequences, that is, cases characterised by rare patterns or outliers that might compromise cluster homogeneity. We also introduce tools to visualise and distinguish the features of regular and deviant cases. Our proposals offer a more accurate and granular description of the data structure, by identifying—besides the most typical types—peculiar sequences that might be interesting from a substan- tive and theoretical point of view. This analysis could be very useful in applications where—under the assumption of within homogeneity—clusters are used as outcome or explanatory variables in regressions. We demonstrate the added value of our pro- posal in a motivating application from life-course socio-demography, focusing on Italian women’s employment trajectories and on their link with their mothers’ par- ticipation in the labour market across geographical areas.

Identifying and Qualifying Deviant Cases in Clusters of Sequences: The Why and The How / R. Piccarreta, E. Struffolino. - In: EUROPEAN JOURNAL OF POPULATION. - ISSN 0168-6577. - 40:1(2024), pp. 1.1-1.19. [10.1007/s10680-023-09682-3]

Identifying and Qualifying Deviant Cases in Clusters of Sequences: The Why and The How

Raffaella Piccarreta;E. Struffolino^Ultimo

2024

Abstract

Sequence analysis is employed in different fields—e.g., demography, sociology, and political sciences—to describe longitudinal processes represented as sequences of categorical states. In many applications, sequences are clustered to identify rel- evant types, which reflect the different empirical realisations of the temporal pro- cess under study. We explore criteria to inspect internal cluster composition and to detect deviant sequences, that is, cases characterised by rare patterns or outliers that might compromise cluster homogeneity. We also introduce tools to visualise and distinguish the features of regular and deviant cases. Our proposals offer a more accurate and granular description of the data structure, by identifying—besides the most typical types—peculiar sequences that might be interesting from a substan- tive and theoretical point of view. This analysis could be very useful in applications where—under the assumption of within homogeneity—clusters are used as outcome or explanatory variables in regressions. We demonstrate the added value of our pro- posal in a motivating application from life-course socio-demography, focusing on Italian women’s employment trajectories and on their link with their mothers’ par- ticipation in the labour market across geographical areas.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
				Cluster analysis; Flagged index plot; Index plot; Sequence analysis; Visualisation
			
	Settori scientifico-disciplinari dell'articolo (sola visualizzazione)
	
				Settore SPS/09 - Sociologia dei Processi economici e del Lavoro
			
	Data di pubblicazione
	
				2024
			
	Rivista in ANCE
	
				EUROPEAN JOURNAL OF POPULATION
			
	DOI
	
				https://dx.doi.org/10.1007/s10680-023-09682-3
			
	Tipologia
	
				Article (author)
			
	Appare nelle tipologie:
	
				01 - Articolo su periodico

File in questo prodotto:

File	Dimensione	Formato
2024_piccarreta_struffolino.pdf accesso aperto Descrizione: RESEARCH NOTES OR RESEARCH COMMENT Tipologia: Publisher's version/PDF Dimensione 5.8 MB Formato Adobe PDF Visualizza/Apri	5.8 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1021416

Citazioni

1

3

1

ND

IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca