IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

A large fraction of traffic on present-day Web servers is generated by bots — intelligent agents able to traverse the Web and execute various advanced tasks. Since bots’ activity may raise concerns about server security and performance, many studies have investigated traffic features discriminating bots from human visitors and developed methods for automated traffic classification. Very few previous works, however, aim at identifying bots on-the-fly, trying to classify active sessions as early as possible. This paper proposes a novel method for binary classification of streams of Web server requests in order to label each active session as “bot” or “human”. A machine learning approach has been developed to discover traffic patterns from historical usage data. The model, built on a neural network, is used to classify each incoming HTTP request and a sequential probabilistic analysis approach is then applied to capture relationships between subsequent HTTP requests in an ongoing session to assess the likelihood of the session being generated by a bot or a human, as soon as possible. A performance evaluation study with real server traffic data confirmed the effectiveness of the proposed classifier in discriminating bots from humans at early stages of their visits, leaving very few of them undecided, with very low number of false positives.

Efficient on-the-fly Web bot detection / G. Suchacka, A. Cabri, S. Rovetta, F. Masulli. - In: KNOWLEDGE-BASED SYSTEMS. - ISSN 0950-7051. - 223:(2021 Jul), pp. 107074.1-107074.16. [10.1016/j.knosys.2021.107074]

Efficient on-the-fly Web bot detection

Suchacka G.^Primo;A. Cabri^Secondo;Rovetta S.;Masulli F.^Ultimo

2021

Abstract

A large fraction of traffic on present-day Web servers is generated by bots — intelligent agents able to traverse the Web and execute various advanced tasks. Since bots’ activity may raise concerns about server security and performance, many studies have investigated traffic features discriminating bots from human visitors and developed methods for automated traffic classification. Very few previous works, however, aim at identifying bots on-the-fly, trying to classify active sessions as early as possible. This paper proposes a novel method for binary classification of streams of Web server requests in order to label each active session as “bot” or “human”. A machine learning approach has been developed to discover traffic patterns from historical usage data. The model, built on a neural network, is used to classify each incoming HTTP request and a sequential probabilistic analysis approach is then applied to capture relationships between subsequent HTTP requests in an ongoing session to assess the likelihood of the session being generated by a bot or a human, as soon as possible. A performance evaluation study with real server traffic data confirmed the effectiveness of the proposed classifier in discriminating bots from humans at early stages of their visits, leaving very few of them undecided, with very low number of false positives.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
				early decision; Internet robot; machine learning; neural network; real-time bot detection; sequential analysis; Web bot
			
	Settori scientifico-disciplinari dell'articolo (sola visualizzazione)
	
				Settore INF/01 - Informatica
			
	Data di pubblicazione
	
				lug-2021
			
	Rivista in ANCE
	
				KNOWLEDGE-BASED SYSTEMS
			
	DOI
	
				https://dx.doi.org/10.1016/j.knosys.2021.107074
			
	Tipologia
	
				Article (author)
			
	Appare nelle tipologie:
	
				01 - Articolo su periodico

File in questo prodotto:

File	Dimensione	Formato
1-s2.0-S0950705121003373-main.pdf accesso aperto Tipologia: Publisher's version/PDF Dimensione 2.29 MB Formato Adobe PDF Visualizza/Apri	2.29 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/955215

Citazioni

ND

26

14

social impact