Spam has become the platform of choice used by cyber-criminals to spread malicious payloads such as viruses and trojans. In this paper, we consider the problem of early detection of spam campaigns. Collaborative spam detection techniques can deal with large scale email data contributed by multiple sources; however, they have the well-known problem of requiring disclosure of email content. Distance-preserving hashes are one of the common solutions used for preserving privacy of email content while enabling message classification for spam detection. However, distance-preserving hashes are not scalable, thus making large scale collaborative solutions difficult to implement. As a solution, we propose Spamdoop, a Big Data privacy-preserving collaborative spam detection platform built on top of a standard Map Reduce facility. Spamdoop uses a highly parallel encoding technique that enables the detection of spam campaigns in competitive times. We evaluate our system's performance using a huge synthetic spam base and show that our technique performs favorably against the creation and delivery overhead of current spam generation tools.
Spamdoop: A privacy-preserving Big Data platform for collaborative spam detection / A. Almahmoud, E. Damiani, H. Otrok, Y. Al Hammadi. - In: IEEE TRANSACTIONS ON BIG DATA. - ISSN 2332-7790. - 5:3(2019), pp. 293-304. [10.1109/TBDATA.2017.2716409]
Spamdoop: A privacy-preserving Big Data platform for collaborative spam detection
E. DamianiSecondo
;
2019
Abstract
Spam has become the platform of choice used by cyber-criminals to spread malicious payloads such as viruses and trojans. In this paper, we consider the problem of early detection of spam campaigns. Collaborative spam detection techniques can deal with large scale email data contributed by multiple sources; however, they have the well-known problem of requiring disclosure of email content. Distance-preserving hashes are one of the common solutions used for preserving privacy of email content while enabling message classification for spam detection. However, distance-preserving hashes are not scalable, thus making large scale collaborative solutions difficult to implement. As a solution, we propose Spamdoop, a Big Data privacy-preserving collaborative spam detection platform built on top of a standard Map Reduce facility. Spamdoop uses a highly parallel encoding technique that enables the detection of spam campaigns in competitive times. We evaluate our system's performance using a huge synthetic spam base and show that our technique performs favorably against the creation and delivery overhead of current spam generation tools.File | Dimensione | Formato | |
---|---|---|---|
Spamdoop_A_Privacy-Preserving_Big_Data_Platform_for_Collaborative_Spam_Detection.pdf
accesso aperto
Tipologia:
Publisher's version/PDF
Dimensione
1.74 MB
Formato
Adobe PDF
|
1.74 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.