Few are as Good as Many: An Ontology-Based Tweet Spam Detection Approach

Halawi, B.; Mourad, A.; Otrok, H.; Damiani, E.

doi:10.1109/ACCESS.2018.2877685

Due to the high popularity of Twitter, spammers tend to favor its use in spreading their commercial messages. In the context of detecting twitter spams, different statistical and behavioral analysis approaches were proposed. However, these techniques suffer from many limitations due to (1) ongoing changes to Twitter’s streaming API which constrains access to a user’s list of followers/followees, (2) spammer’s creativity in building diverse messages, (3) use of embedded links and new accounts, and (4) need for analyzing different characteristics about users without their consent. To address the aforementioned challenges, we propose a novel ontology-based approach for spam detection over Twitter during events by analyzing the relationship between ham user tweets vs. spams. Our approach relies solely on public tweet messages while performing the analysis and classification tasks. In this context, ontologies are derived and used to generate a dictionary that validates real tweet messages from random topics. Similarity ratio among the dictionary and tweets is used to reflect the legitimacy of the messages. Experiments conducted on real tweet data illustrate that message-to-message techniques achieved a low detection rate compared to our ontology based approach which outperforms them by approximately 200%, in addition to promising scalability for large data analysis.

Few are as Good as Many: An Ontology-Based Tweet Spam Detection Approach / B. Halawi, A. Mourad, H. Otrok, E. Damiani. - In: IEEE ACCESS. - ISSN 2169-3536. - 6(2018), pp. 63890-63904.

Few are as Good as Many: An Ontology-Based Tweet Spam Detection Approach

Bahia Halawi;Azzam Mourad;Hadi Otrok;E. Damiani^Ultimo

2018

Abstract

Due to the high popularity of Twitter, spammers tend to favor its use in spreading their commercial messages. In the context of detecting twitter spams, different statistical and behavioral analysis approaches were proposed. However, these techniques suffer from many limitations due to (1) ongoing changes to Twitter’s streaming API which constrains access to a user’s list of followers/followees, (2) spammer’s creativity in building diverse messages, (3) use of embedded links and new accounts, and (4) need for analyzing different characteristics about users without their consent. To address the aforementioned challenges, we propose a novel ontology-based approach for spam detection over Twitter during events by analyzing the relationship between ham user tweets vs. spams. Our approach relies solely on public tweet messages while performing the analysis and classification tasks. In this context, ontologies are derived and used to generate a dictionary that validates real tweet messages from random topics. Similarity ratio among the dictionary and tweets is used to reflect the legitimacy of the messages. Experiments conducted on real tweet data illustrate that message-to-message techniques achieved a low detection rate compared to our ontology based approach which outperforms them by approximately 200%, in addition to promising scalability for large data analysis.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
				Twitter; Meta-data; Spam detection; Text based Analysis; Event spammers; Ontology
			
	Settori scientifico-disciplinari dell'articolo (sola visualizzazione)
	
				Settore INF/01 - Informatica
			
	Titolo del progetto
	
	Titolo Progetto
	
									TrustwOrthy model-awaRE Analytics Data platfORm
								
	Acronimo
	
									TOREADOR
								
	Nome finanziatore
	
										EUROPEAN COMMISSION
									
	Finanziamento
	
									H2020
								
	N. Contratto
	
									688797
								
	Data di pubblicazione
	
				2018
			
	Rivista in ANCE
	
				IEEE ACCESS
			
	DOI
	
				https://dx.doi.org/10.1109/ACCESS.2018.2877685
			
	Tipologia
	
				Article (author)
			
	Appare nelle tipologie:
	
				01 - Articolo su periodico

File in questo prodotto:

File	Dimensione	Formato
08502923.pdf accesso aperto Tipologia: Publisher's version/PDF Dimensione 8.53 MB Formato Adobe PDF Visualizza/Apri	8.53 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/597506

Citazioni

ND

21

9

IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca