Spam has become the platform of choice used by cyber-criminals to spread malicious payloads such as viruses and trojans. In this paper, we consider the problem of early detection of spam campaigns. Collaborative spam detection techniques can deal with large scale email data contributed by multiple sources; however, they have the well-known problem of requiring disclosure of email content. Distance-preserving hashes are one of the common solutions used for preserving privacy of email content while enabling message classification for spam detection. However, distance-preserving hashes are not scalable, thus making large scale collaborative solutions difficult to implement. As a solution, we propose Spamdoop, a Big Data privacy-preserving collaborative spam detection platform built on top of a standard Map Reduce facility. Spamdoop uses a highly parallel encoding technique that enables the detection of spam campaigns in competitive times. We evaluate our system's performance using a huge synthetic spam base and show that our technique performs favorably against the creation and delivery overhead of current spam generation tools.

Spamdoop: A privacy-preserving Big Data platform for collaborative spam detection / A. Almahmoud, E. Damiani, H. Otrok, Y. Al Hammadi. - In: IEEE TRANSACTIONS ON BIG DATA. - ISSN 2332-7790. - 5:3(2019), pp. 293-304. [10.1109/TBDATA.2017.2716409]

Spamdoop: A privacy-preserving Big Data platform for collaborative spam detection

E. Damiani
Secondo
;
2019

Abstract

Spam has become the platform of choice used by cyber-criminals to spread malicious payloads such as viruses and trojans. In this paper, we consider the problem of early detection of spam campaigns. Collaborative spam detection techniques can deal with large scale email data contributed by multiple sources; however, they have the well-known problem of requiring disclosure of email content. Distance-preserving hashes are one of the common solutions used for preserving privacy of email content while enabling message classification for spam detection. However, distance-preserving hashes are not scalable, thus making large scale collaborative solutions difficult to implement. As a solution, we propose Spamdoop, a Big Data privacy-preserving collaborative spam detection platform built on top of a standard Map Reduce facility. Spamdoop uses a highly parallel encoding technique that enables the detection of spam campaigns in competitive times. We evaluate our system's performance using a huge synthetic spam base and show that our technique performs favorably against the creation and delivery overhead of current spam generation tools.
Spam Campaign; Privacy-Preserving Analysis; Map Reduce
Settore INF/01 - Informatica
2019
2016
Article (author)
File in questo prodotto:
File Dimensione Formato  
Spamdoop_A_Privacy-Preserving_Big_Data_Platform_for_Collaborative_Spam_Detection.pdf

accesso aperto

Tipologia: Publisher's version/PDF
Dimensione 1.74 MB
Formato Adobe PDF
1.74 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/506589
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 12
  • ???jsp.display-item.citation.isi??? 9
social impact