IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

We develop the first fully dynamic algorithm that maintains a decision tree over an arbitrary sequence of insertions and deletions of labeled examples. Given ε>0 our algorithm guarantees that, at every point in time, every node of the decision tree uses a split with Gini gain within an additive ε of the optimum. For real-valued features the algorithm has an amortized running time per insertion/deletion of O((d·log³n)/ε²), which improves to O((d·log²n)/ε) for binary or categorical features, while it uses space O(n·d), where n is the maximum number of examples at any point in time and d is the number of features. Our algorithm is nearly optimal, as we show that any algorithm with similar guarantees requires amortized running time Ω(d) and space Ω(n·d/polylog(nd)). We complement our theoretical results with an extensive experimental evaluation on real-world data, showing the effectiveness of our algorithm.

Fully-Dynamic Decision Trees / M. Bressan, G. Damay, M. Sozio (PROCEEDINGS OF THE ... AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE). - In: AAAI 2023 : Proceedings of the 37th AAAI Conference on Artificial Intelligence[s.l] : AAAI Press, 2023. - ISBN 978-1-57735-880-0. - pp. 6842-6849 (( Intervento presentato al 37. convegno Thirty-Seventh AAAI Conference on Artificial Intelligence : February 7–14 tenutosi a Washington (D.C., USA) nel 2023 [10.1609/aaai.v37i6.25838].

Fully-Dynamic Decision Trees

M. Bressan^Primo;Damay G.;Sozio M.

2023

Abstract

We develop the first fully dynamic algorithm that maintains a decision tree over an arbitrary sequence of insertions and deletions of labeled examples. Given ε>0 our algorithm guarantees that, at every point in time, every node of the decision tree uses a split with Gini gain within an additive ε of the optimum. For real-valued features the algorithm has an amortized running time per insertion/deletion of O((d·log³n)/ε²), which improves to O((d·log²n)/ε) for binary or categorical features, while it uses space O(n·d), where n is the maximum number of examples at any point in time and d is the number of features. Our algorithm is nearly optimal, as we show that any algorithm with similar guarantees requires amortized running time Ω(d) and space Ω(n·d/polylog(nd)). We complement our theoretical results with an extensive experimental evaluation on real-world data, showing the effectiveness of our algorithm.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
				ML: Classification and Regression, DMKM: Data Stream Mining, ML: Scalability of ML Systems;
			
	Settori scientifico-disciplinari del contributo (sola visualizzazione)
	
				Settore INF/01 - Informatica
			
	Data di pubblicazione
	
				2023
			
	Enti collegati al convegno
	
				Association for the Advancement of Artificial Intelligence (AAAI)
			
	DOI
	
				https://dx.doi.org/10.1609/aaai.v37i6.25838
			
	Tipologia
	
				Book Part (author)
			
	Appare nelle tipologie:
	
				03 - Contributo in volume

File in questo prodotto:

File	Dimensione	Formato
2023-AAAI.pdf accesso riservato Tipologia: Publisher's version/PDF Dimensione 3.91 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	3.91 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1033729

Citazioni

ND

3

0

ND

social impact