IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

Advances in Machine Learning (ML), coupled with increased availability of huge amounts of data collected from diverse sources and improvements in computing power, have led to a widespread adoption of ML-based solutions in critical application scenarios. However, ML models intrinsically introduce new security vulnerabilities within the systems into which they are integrated, thereby expanding their attack surface. The security of ML-based systems hinges on the robustness of the ML model employed. By interfering with any of the phases of the learning process, an adversary can manipulate data and prevent the model from learning the correct correlations or mislead it into taking potentially harmful actions. Adversarial ML is a recent research field that addresses two specific research topics. One of them concerns the identification of security issues related to the use of ML models, and the other concerns the design of defense mechanisms to prevent or mitigate the detrimental effects of attacks. In this dissertation, we investigate how to improve the resilience of ML models against training-time attacks under black-box knowledge assumption on both the attacker and the defender. The main contribution of this work is a novel defense mechanism which combines ensemble models (an approach traditionally used only for increasing the generalization capabilities of the model) and security risk analysis. Specifically, the results from the risk analysis in the input data space are used to guide the partitioning of the training data via an unsupervised technique. Then, we employ an ensemble of models, each trained on a different partition, and combine their output based on a majority voting mechanism to obtain the final prediction. Experiments are carried out on a publicly available dataset to assess the effectiveness of the proposed method. This novel defence technique is complemented by two other contributions, which respectively support using a Distributed Ledger to make training data tampering less convenient for attackers, and using a quantitative index to compute ML models’ performance degradation before and after the deployment of the defense. Taken together, this set of techniques provides a framework to improve the robustness of the ML lifecycle.

DATA PARTITIONING AND COMPENSATION TECHNIQUES FOR SECURE TRAINING OF MACHINE LEARNING MODELS / L. Mauri ; tutor: E. Damiani; co-tutor: B. Apolloni; coordinator: P. Boldi. Dipartimento di Informatica Giovanni Degli Antoni, 2022 Jul 18. 34. ciclo, Anno Accademico 2021.

DATA PARTITIONING AND COMPENSATION TECHNIQUES FOR SECURE TRAINING OF MACHINE LEARNING MODELS

L. Mauri

2022

Abstract

Advances in Machine Learning (ML), coupled with increased availability of huge amounts of data collected from diverse sources and improvements in computing power, have led to a widespread adoption of ML-based solutions in critical application scenarios. However, ML models intrinsically introduce new security vulnerabilities within the systems into which they are integrated, thereby expanding their attack surface. The security of ML-based systems hinges on the robustness of the ML model employed. By interfering with any of the phases of the learning process, an adversary can manipulate data and prevent the model from learning the correct correlations or mislead it into taking potentially harmful actions. Adversarial ML is a recent research field that addresses two specific research topics. One of them concerns the identification of security issues related to the use of ML models, and the other concerns the design of defense mechanisms to prevent or mitigate the detrimental effects of attacks. In this dissertation, we investigate how to improve the resilience of ML models against training-time attacks under black-box knowledge assumption on both the attacker and the defender. The main contribution of this work is a novel defense mechanism which combines ensemble models (an approach traditionally used only for increasing the generalization capabilities of the model) and security risk analysis. Specifically, the results from the risk analysis in the input data space are used to guide the partitioning of the training data via an unsupervised technique. Then, we employ an ensemble of models, each trained on a different partition, and combine their output based on a majority voting mechanism to obtain the final prediction. Experiments are carried out on a publicly available dataset to assess the effectiveness of the proposed method. This novel defence technique is complemented by two other contributions, which respectively support using a Distributed Ledger to make training data tampering less convenient for attackers, and using a quantitative index to compute ML models’ performance degradation before and after the deployment of the defense. Taken together, this set of techniques provides a framework to improve the robustness of the ML lifecycle.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di discussione
	
			18-lug-2022
		
	Settori scientifico-disciplinari della tesi
	
			Settore INF/01 - Informatica
		
	Parole chiave
	
			Adversarial Machine Learning; Secure Machine Learning; Machine Learning Model Robustness
		
	Tutor afferenti all'Ateneo
	
			DAMIANI, ERNESTO
		
	Supervisori e coordinatori afferenti all'Ateneo
	
			BOLDI, PAOLO
		
	Tipologia
	
			Doctoral Thesis
		
	Citazione
	
			DATA PARTITIONING AND COMPENSATION TECHNIQUES FOR SECURE TRAINING OF MACHINE LEARNING MODELS / L. Mauri ; tutor: E. Damiani; co-tutor: B. Apolloni; coordinator: P. Boldi. Dipartimento di Informatica Giovanni Degli Antoni, 2022 Jul 18. 34. ciclo, Anno Accademico 2021.
		
	Appare nelle tipologie:
	
			Tesi di dottorato

File in questo prodotto:

File	Dimensione	Formato
phd_unimi_R12296.pdf accesso aperto Descrizione: Doctoral Thesis Tipologia: Altro Dimensione 7.91 MB Formato Adobe PDF Visualizza/Apri	7.91 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/932387

Citazioni

ND

ND

ND

social impact