Advances in Machine Learning (ML), coupled with increased availability of huge amounts of data collected from diverse sources and improvements in computing power, have led to a widespread adoption of ML-based solutions in critical application scenarios. However, ML models intrinsically introduce new security vulnerabilities within the systems into which they are integrated, thereby expanding their attack surface. The security of ML-based systems hinges on the robustness of the ML model employed. By interfering with any of the phases of the learning process, an adversary can manipulate data and prevent the model from learning the correct correlations or mislead it into taking potentially harmful actions. Adversarial ML is a recent research field that addresses two specific research topics. One of them concerns the identification of security issues related to the use of ML models, and the other concerns the design of defense mechanisms to prevent or mitigate the detrimental effects of attacks. In this dissertation, we investigate how to improve the resilience of ML models against training-time attacks under black-box knowledge assumption on both the attacker and the defender. The main contribution of this work is a novel defense mechanism which combines ensemble models (an approach traditionally used only for increasing the generalization capabilities of the model) and security risk analysis. Specifically, the results from the risk analysis in the input data space are used to guide the partitioning of the training data via an unsupervised technique. Then, we employ an ensemble of models, each trained on a different partition, and combine their output based on a majority voting mechanism to obtain the final prediction. Experiments are carried out on a publicly available dataset to assess the effectiveness of the proposed method. This novel defence technique is complemented by two other contributions, which respectively support using a Distributed Ledger to make training data tampering less convenient for attackers, and using a quantitative index to compute ML models’ performance degradation before and after the deployment of the defense. Taken together, this set of techniques provides a framework to improve the robustness of the ML lifecycle.

DATA PARTITIONING AND COMPENSATION TECHNIQUES FOR SECURE TRAINING OF MACHINE LEARNING MODELS / L. Mauri ; tutor: E. Damiani; co-tutor: B. Apolloni; coordinator: P. Boldi. Dipartimento di Informatica Giovanni Degli Antoni, 2022 Jul 18. 34. ciclo, Anno Accademico 2021.

DATA PARTITIONING AND COMPENSATION TECHNIQUES FOR SECURE TRAINING OF MACHINE LEARNING MODELS

L. Mauri
2022

Abstract

Advances in Machine Learning (ML), coupled with increased availability of huge amounts of data collected from diverse sources and improvements in computing power, have led to a widespread adoption of ML-based solutions in critical application scenarios. However, ML models intrinsically introduce new security vulnerabilities within the systems into which they are integrated, thereby expanding their attack surface. The security of ML-based systems hinges on the robustness of the ML model employed. By interfering with any of the phases of the learning process, an adversary can manipulate data and prevent the model from learning the correct correlations or mislead it into taking potentially harmful actions. Adversarial ML is a recent research field that addresses two specific research topics. One of them concerns the identification of security issues related to the use of ML models, and the other concerns the design of defense mechanisms to prevent or mitigate the detrimental effects of attacks. In this dissertation, we investigate how to improve the resilience of ML models against training-time attacks under black-box knowledge assumption on both the attacker and the defender. The main contribution of this work is a novel defense mechanism which combines ensemble models (an approach traditionally used only for increasing the generalization capabilities of the model) and security risk analysis. Specifically, the results from the risk analysis in the input data space are used to guide the partitioning of the training data via an unsupervised technique. Then, we employ an ensemble of models, each trained on a different partition, and combine their output based on a majority voting mechanism to obtain the final prediction. Experiments are carried out on a publicly available dataset to assess the effectiveness of the proposed method. This novel defence technique is complemented by two other contributions, which respectively support using a Distributed Ledger to make training data tampering less convenient for attackers, and using a quantitative index to compute ML models’ performance degradation before and after the deployment of the defense. Taken together, this set of techniques provides a framework to improve the robustness of the ML lifecycle.
18-lug-2022
Settore INF/01 - Informatica
Adversarial Machine Learning; Secure Machine Learning; Machine Learning Model Robustness
DAMIANI, ERNESTO
BOLDI, PAOLO
Doctoral Thesis
DATA PARTITIONING AND COMPENSATION TECHNIQUES FOR SECURE TRAINING OF MACHINE LEARNING MODELS / L. Mauri ; tutor: E. Damiani; co-tutor: B. Apolloni; coordinator: P. Boldi. Dipartimento di Informatica Giovanni Degli Antoni, 2022 Jul 18. 34. ciclo, Anno Accademico 2021.
File in questo prodotto:
File Dimensione Formato  
phd_unimi_R12296.pdf

accesso aperto

Descrizione: Doctoral Thesis
Tipologia: Altro
Dimensione 7.91 MB
Formato Adobe PDF
7.91 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/932387
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact