A significant problem nowadays is detection of Web traffic generated by automatic software agents (Web bots). Some studies have dealt with this task by proposing various approaches to Web traffic classification in order to distinguish the traffic stemming from human users' visits from that generated by bots. Most of previous works addressed the problem of offline bot recognition, based on available information on user sessions completed on a Web server. Very few approaches, however, have been proposed to recognize bots online, before the session completes. This paper proposes a novel approach to binary classification of a multivariate data stream incoming on a Web server, in order to recognize ongoing user sessions as generated by bots or humans. The present approach uses deep neural networks combined with Wald's Sequential Probability Ratio Test to express the relationship between subsequent HTTP requests in an ongoing session and to assess the likelihood of each session being generated by a bot or human before it ends. Experimental results showed the ability of the proposed approach to detect Web bots online with high performance scores and a small number of false negatives, as evidenced by the Recall index, minimizing the impact on human visitors. Another valuable indicator is the speed of decision: the present method allows very quick classification of nearly all sessions, leaving only very few of them undecided.

Online Web Bot Detection Using a Sequential Classification Approach / A. Cabri, G. Suchacka, S. Rovetta, F. Masulli - In: 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS)[s.l] : IEEE, 2019. - ISBN 978-1-5386-6614-2. - pp. 1536-1540 (( convegno 20th International Conference on High Performance Computing and Communications, 16th IEEE International Conference on Smart City and 4th IEEE International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2018 tenutosi a Exeter nel 2018 [10.1109/HPCC/SmartCity/DSS.2018.00252].

Online Web Bot Detection Using a Sequential Classification Approach

A. Cabri
Primo
;
2019

Abstract

A significant problem nowadays is detection of Web traffic generated by automatic software agents (Web bots). Some studies have dealt with this task by proposing various approaches to Web traffic classification in order to distinguish the traffic stemming from human users' visits from that generated by bots. Most of previous works addressed the problem of offline bot recognition, based on available information on user sessions completed on a Web server. Very few approaches, however, have been proposed to recognize bots online, before the session completes. This paper proposes a novel approach to binary classification of a multivariate data stream incoming on a Web server, in order to recognize ongoing user sessions as generated by bots or humans. The present approach uses deep neural networks combined with Wald's Sequential Probability Ratio Test to express the relationship between subsequent HTTP requests in an ongoing session and to assess the likelihood of each session being generated by a bot or human before it ends. Experimental results showed the ability of the proposed approach to detect Web bots online with high performance scores and a small number of false negatives, as evidenced by the Recall index, minimizing the impact on human visitors. Another valuable indicator is the speed of decision: the present method allows very quick classification of nearly all sessions, leaving only very few of them undecided.
HTTP request analysis; Internet security; Machine learning; Neural networks; Sequential classification; Web bot detection
Settore INF/01 - Informatica
2019
Book Part (author)
File in questo prodotto:
File Dimensione Formato  
2018-DSS-bot.pdf

accesso riservato

Tipologia: Publisher's version/PDF
Dimensione 179.29 kB
Formato Adobe PDF
179.29 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/955217
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 20
  • ???jsp.display-item.citation.isi??? 13
social impact