IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

In addressing the limited availability of data for predictive purposes with machine learning, we are concerned with potential biases arising from dataset augmentation. Despite advanced algorithms to generate synthetic data that can preserve the original data distribution, challenges remain, including the risk of perpetuating social biases. Our approach uses a similarity network representation that treats each data point as a node and strategically generates synthetic points near it. A vector label propagation algorithm, complemented by an exponential kernel for adjusting link weights, accurately labels these synthetic points. The primary goal is to reduce the system’s dependence on sensitive features without excluding them, thereby avoiding the risk of exacerbating biases or reducing data variation. Implemented in a big data ecosystem, our methodology enables continuous evaluation in an evolving domain, effectively addressing the challenges of data scarcity with a fairness-aware approach.

A Novel Assurance Procedure for Fair Data Augmentation in Machine Learning / S. Maghool, P. Ceravolo, F. Berto - In: AIEB 2024 : Workshop on Implementing AI Ethics through a Behavioural Lens 2024 / [a cura di] L. Nannini, A. Gillard, C. Friedman Levy, A. Ozkes, M. Slavkovik. - [s.l] : CEUR-WS, 2025 Apr 08. - pp. 25-36 (( Intervento presentato al 26. convegno European Conference on Artificial Intelligence tenutosi a Santiago de Compostela nel 2024.

A Novel Assurance Procedure for Fair Data Augmentation in Machine Learning

S. Maghool^Primo;P. Ceravolo^Secondo;F. Berto^Ultimo

2025

Abstract

In addressing the limited availability of data for predictive purposes with machine learning, we are concerned with potential biases arising from dataset augmentation. Despite advanced algorithms to generate synthetic data that can preserve the original data distribution, challenges remain, including the risk of perpetuating social biases. Our approach uses a similarity network representation that treats each data point as a node and strategically generates synthetic points near it. A vector label propagation algorithm, complemented by an exponential kernel for adjusting link weights, accurately labels these synthetic points. The primary goal is to reduce the system’s dependence on sensitive features without excluding them, thereby avoiding the risk of exacerbating biases or reducing data variation. Implemented in a big data ecosystem, our methodology enables continuous evaluation in an evolving domain, effectively addressing the challenges of data scarcity with a fairness-aware approach.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
				Machine Learning; Fairness; Similarity Network; Data Augmentation
			
	Settori scientifico-disciplinari del contributo (validi dal 09/05/2024)
	
				Settore INFO-01/A - Informatica
			
	Data di pubblicazione
	
				8-apr-2025
			
	URL
	
				https://ceur-ws.org/Vol-3948/paper3.pdf
			
	Tipologia
	
				Book Part (author)
			
	Appare nelle tipologie:
	
				03 - Contributo in volume

File in questo prodotto:

File	Dimensione	Formato
paper3.pdf accesso aperto Tipologia: Publisher's version/PDF Dimensione 1.42 MB Formato Adobe PDF Visualizza/Apri	1.42 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1159100

Citazioni

ND

ND

ND

ND

social impact