One of the promises of the "big data" revolution is that trough the analysis of large datasets people will benefit from the solution to many different problems obtained by the deployment of advanced machine learning models. One of the challenges of this standard approach, is that information needs to be centralized on the data center or the machine where the training phase is performed, posing many concerns about privacy. In this paper we take a step towards secure and efficient processing of distributed large datasets, where original data reside at different locations and are processed in a privacy preserving way. In particular we rely on the available technologies to achieve the secure design of a machine learning model by performing the training phase on encrypted data. The case study we examine is focused on the forecasting of energy production by wind farms situated in different locations. We show in detail how the machine learning model is created on the basis of the available datasets, we compare the results with the ones produced by the previous models, and discuss also their performances.

Towards Efficient and Secure Analysis of Large Datasets / S. Cimato, S. Nicolo - In: 2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC)[s.l] : IEEE, 2020. - ISBN 9781728173030. - pp. 1351-1356 (( Intervento presentato al 44. convegno 44th Annual Computers, Software, and Applications Conference (COMPSAC), 5th IEEE International Workshop on Distributed Big Data Management tenutosi a Madrid nel 2020 [10.1109/COMPSAC48688.2020.00-68].

Towards Efficient and Secure Analysis of Large Datasets

S. Cimato
Primo
;
2020

Abstract

One of the promises of the "big data" revolution is that trough the analysis of large datasets people will benefit from the solution to many different problems obtained by the deployment of advanced machine learning models. One of the challenges of this standard approach, is that information needs to be centralized on the data center or the machine where the training phase is performed, posing many concerns about privacy. In this paper we take a step towards secure and efficient processing of distributed large datasets, where original data reside at different locations and are processed in a privacy preserving way. In particular we rely on the available technologies to achieve the secure design of a machine learning model by performing the training phase on encrypted data. The case study we examine is focused on the forecasting of energy production by wind farms situated in different locations. We show in detail how the machine learning model is created on the basis of the available datasets, we compare the results with the ones produced by the previous models, and discuss also their performances.
machine learning; privacy preserving techniques; secure multi-party computation
Settore INF/01 - Informatica
Settore ING-INF/05 - Sistemi di Elaborazione delle Informazioni
   Cyber security cOmpeteNce fOr Research anD Innovation (CONCORDIA)
   CONCORDIA
   EUROPEAN COMMISSION
   H2020
   830927
2020
Book Part (author)
File in questo prodotto:
File Dimensione Formato  
paperBDDM-437.pdf

accesso aperto

Tipologia: Post-print, accepted manuscript ecc. (versione accettata dall'editore)
Dimensione 517.62 kB
Formato Adobe PDF
517.62 kB Adobe PDF Visualizza/Apri
09202485.pdf

accesso riservato

Tipologia: Publisher's version/PDF
Dimensione 628.68 kB
Formato Adobe PDF
628.68 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/771388
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact