A Critical Analysis of Classifier Selection in Learned Bloom Filters: The Essentials

Malchiodi, D.; Raimondi, D.; Fumagalli, G.; Giancarlo, R.; Frasca, M.

doi:10.1007/978-3-031-34204-2_5

It is well known that Bloom Filters have a performance essentially independent of the data used to query the ﬁlters themselves, but this is no more true when considering Learned Bloom Filters. In this work we analyze how the performance of such learned data structures is impacted by the classiﬁer chosen to build the ﬁlter and by the complexity of the dataset used in the training phase. Such analysis, which has not been proposed so far in the literature, involves the key performance indicators of space eﬃciency, false positive rate, and reject time. By screening various implementations of Learned Bloom Filters, our experimental study highlights that only one of these implementations exhibits higher robustness to classiﬁer performance and to noisy data, and that only two families of classiﬁers have desirable properties in relation to the previous performance indicators.

A Critical Analysis of Classifier Selection in Learned Bloom Filters: The Essentials / D. Malchiodi, D. Raimondi, G. Fumagalli, R. Giancarlo, M. Frasca (COMMUNICATIONS IN COMPUTER AND INFORMATION SCIENCE). - In: Engineering Applications of Neural Networks / [a cura di] L. Iliadis, I. Maglogiannis, S. Alonso, C. Jayne, E. Pimenidis. - [s.l] : Springer Nature, 2023. - ISBN 978-3-031-34203-5. - pp. 47-61 (( Intervento presentato al 24. convegno International Conference, EAAAI/EANN tenutosi a León nel 2023 [10.1007/978-3-031-34204-2_5].

A Critical Analysis of Classifier Selection in Learned Bloom Filters: The Essentials

D. Malchiodi;Raimondi, Davide;Fumagalli, Giacomo;Giancarlo, Raffaele;M. Frasca

2023

Abstract

It is well known that Bloom Filters have a performance essentially independent of the data used to query the ﬁlters themselves, but this is no more true when considering Learned Bloom Filters. In this work we analyze how the performance of such learned data structures is impacted by the classiﬁer chosen to build the ﬁlter and by the complexity of the dataset used in the training phase. Such analysis, which has not been proposed so far in the literature, involves the key performance indicators of space eﬃciency, false positive rate, and reject time. By screening various implementations of Learned Bloom Filters, our experimental study highlights that only one of these implementations exhibits higher robustness to classiﬁer performance and to noisy data, and that only two families of classiﬁers have desirable properties in relation to the previous performance indicators.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
				Learned Bloom ﬁlters; Data complexity; Learned data structures
			
	Settori scientifico-disciplinari del contributo (sola visualizzazione)
	
				Settore INF/01 - Informatica
			
	Titolo del progetto
	
	Titolo Progetto
	
									Multi-criteria optimized data structures: from compressed indexes to learned indexes, and beyond
								
	Nome finanziatore
	
										MINISTERO DELL'ISTRUZIONE E DEL MERITO
									
	N. Contratto
	
									2017WR7SHH_004
								
	Data di pubblicazione
	
				2023
			
	DOI
	
				https://dx.doi.org/10.1007/978-3-031-34204-2_5
			
	Tipologia
	
				Book Part (author)
			
	Appare nelle tipologie:
	
				03 - Contributo in volume

File in questo prodotto:

File	Dimensione	Formato
EANN-2023-published.pdf accesso riservato Descrizione: Lavoro pubblicato Tipologia: Publisher's version/PDF Dimensione 1.37 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.37 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/981848

Citazioni

ND

1

1

IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca