This paper proposes a technique for classifying user accounts on social networks to detect fraud in Online Social Networks (OSN). The main purpose of our classification is to recognize the patterns of users from Human, Bots or Cyborgs. Classic and consolidated approaches of Text Mining employ textual features from Natural Language Processing (NLP) for classification, but some drawbacks as computational cost, the huge amount of data could rise in real-life scenarios. This work uses an approach based on statistical frequency parameters of the user posting to distinguish the types of users without textual content. We perform the experiment over a Twitter dataset and as learn-based algorithms in classification task we compared Random Forest (RF), Support Vector Machine (SVM), k-nearest Neighbors (k-NN), Gradient Boosting Machine (GBM) and Extreme Gradient Boosting (XGBoost). Using the standard parameters of each algorithm, we achieved accuracy results of 88% and 84% by RF and XGBoost, respectively.
User Classification on Online Social Networks by Post Frequency / G. MARQUES TAVARES, S. Mastelini, S.B. Jr. - In: 2017: Anais do XIII Simpósio Brasileiro de Sistemas de Informação / [a cura di] J.M. David, A. Pimenta Freire. - Porto Alegre : SBC, 2017. - pp. 464-471 [10.5753/sbsi.2017.6076]
User Classification on Online Social Networks by Post Frequency
G. MARQUES TAVARES;
2017
Abstract
This paper proposes a technique for classifying user accounts on social networks to detect fraud in Online Social Networks (OSN). The main purpose of our classification is to recognize the patterns of users from Human, Bots or Cyborgs. Classic and consolidated approaches of Text Mining employ textual features from Natural Language Processing (NLP) for classification, but some drawbacks as computational cost, the huge amount of data could rise in real-life scenarios. This work uses an approach based on statistical frequency parameters of the user posting to distinguish the types of users without textual content. We perform the experiment over a Twitter dataset and as learn-based algorithms in classification task we compared Random Forest (RF), Support Vector Machine (SVM), k-nearest Neighbors (k-NN), Gradient Boosting Machine (GBM) and Extreme Gradient Boosting (XGBoost). Using the standard parameters of each algorithm, we achieved accuracy results of 88% and 84% by RF and XGBoost, respectively.File | Dimensione | Formato | |
---|---|---|---|
user.pdf
accesso aperto
Tipologia:
Publisher's version/PDF
Dimensione
417.31 kB
Formato
Adobe PDF
|
417.31 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.