User Classification on Online Social Networks by Post Frequency

MARQUES TAVARES, G.; Mastelini, S.; Sylvio Barbon, J.

doi:10.5753/sbsi.2017.6076

This paper proposes a technique for classifying user accounts on social networks to detect fraud in Online Social Networks (OSN). The main purpose of our classification is to recognize the patterns of users from Human, Bots or Cyborgs. Classic and consolidated approaches of Text Mining employ textual features from Natural Language Processing (NLP) for classification, but some drawbacks as computational cost, the huge amount of data could rise in real-life scenarios. This work uses an approach based on statistical frequency parameters of the user posting to distinguish the types of users without textual content. We perform the experiment over a Twitter dataset and as learn-based algorithms in classification task we compared Random Forest (RF), Support Vector Machine (SVM), k-nearest Neighbors (k-NN), Gradient Boosting Machine (GBM) and Extreme Gradient Boosting (XGBoost). Using the standard parameters of each algorithm, we achieved accuracy results of 88% and 84% by RF and XGBoost, respectively.

User Classification on Online Social Networks by Post Frequency / G. MARQUES TAVARES, S. Mastelini, S.B. Jr. - In: 2017: Anais do XIII Simpósio Brasileiro de Sistemas de Informação / [a cura di] J.M. David, A. Pimenta Freire. - Porto Alegre : SBC, 2017. - pp. 464-471 [10.5753/sbsi.2017.6076]

User Classification on Online Social Networks by Post Frequency

G. MARQUES TAVARES;Saulo Mastelini;Sylvio Barbon Jr.

2017

Abstract

This paper proposes a technique for classifying user accounts on social networks to detect fraud in Online Social Networks (OSN). The main purpose of our classification is to recognize the patterns of users from Human, Bots or Cyborgs. Classic and consolidated approaches of Text Mining employ textual features from Natural Language Processing (NLP) for classification, but some drawbacks as computational cost, the huge amount of data could rise in real-life scenarios. This work uses an approach based on statistical frequency parameters of the user posting to distinguish the types of users without textual content. We perform the experiment over a Twitter dataset and as learn-based algorithms in classification task we compared Random Forest (RF), Support Vector Machine (SVM), k-nearest Neighbors (k-NN), Gradient Boosting Machine (GBM) and Extreme Gradient Boosting (XGBoost). Using the standard parameters of each algorithm, we achieved accuracy results of 88% and 84% by RF and XGBoost, respectively.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
				Online Social Networks; Machine Learning; User Classification; Twitter
			
	Settori scientifico-disciplinari del contributo
	
				Settore INF/01 - Informatica
			
	Data di pubblicazione
	
				2017
			
	DOI
	
				https://dx.doi.org/10.5753/sbsi.2017.6076
			
	URL
	
				https://sol.sbc.org.br/index.php/sbsi/article/view/6076
			
	Tipologia
	
				Book Part (author)
			
	Appare nelle tipologie:
	
				03 - Contributo in volume

File in questo prodotto:

File	Dimensione	Formato
user.pdf accesso aperto Tipologia: Publisher's version/PDF Dimensione 417.31 kB Formato Adobe PDF Visualizza/Apri	417.31 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/772374

Citazioni

ND

ND

ND

IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca