IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

In this paper we show how to extract a hypothesis with small risk from the ensemble of hypotheses generated by an arbitrary on-line learning algorithm run on an i.i.d. sample of data. Using a simple large deviation argument, we prove tight data-dependent bounds for the risk of this hypothesis in terms of an easily computable statistic M_n associated with the on-line performance of the ensemble. Via sharp pointwise bounds on M_n, we then obtain risk tail bounds for kernel Perceptron algorithms in terms of the spectrum of the empirical kernel matrix. These bounds reveal that the linear hypotheses found via our approach achieve optimal trade-offs between hinge loss and margin size over the class of all linear functions, an issue that was left open by previous results. A distinctive feature of our approach is that the key tools for our analysis come from the model of prediction of individual sequences; i.e., a model making no probabilistic assumptions on the source generating the data. In fact, these tools turn out to be so powerful that we only need very elementary statistical facts to obtain our final risk bounds.

On the generalization ability of on-line learning algorithms / N. Cesa-Bianchi, A. Conconi, C. Gentile. - In: IEEE TRANSACTIONS ON INFORMATION THEORY. - ISSN 0018-9448. - 50:9(2004), pp. 2050-2057. [10.1109/TIT.2004.833339]

On the generalization ability of on-line learning algorithms

N. Cesa-Bianchi^Primo;A. Conconi^Secondo;C. Gentile

2004

Abstract

In this paper we show how to extract a hypothesis with small risk from the ensemble of hypotheses generated by an arbitrary on-line learning algorithm run on an i.i.d. sample of data. Using a simple large deviation argument, we prove tight data-dependent bounds for the risk of this hypothesis in terms of an easily computable statistic M_n associated with the on-line performance of the ensemble. Via sharp pointwise bounds on M_n, we then obtain risk tail bounds for kernel Perceptron algorithms in terms of the spectrum of the empirical kernel matrix. These bounds reveal that the linear hypotheses found via our approach achieve optimal trade-offs between hinge loss and margin size over the class of all linear functions, an issue that was left open by previous results. A distinctive feature of our approach is that the key tools for our analysis come from the model of prediction of individual sequences; i.e., a model making no probabilistic assumptions on the source generating the data. In fact, these tools turn out to be so powerful that we only need very elementary statistical facts to obtain our final risk bounds.

Scheda breve

Scheda completa

Scheda completa (DC)

	Settori scientifico-disciplinari dell'articolo (sola visualizzazione)
	
				Settore INF/01 - Informatica
			
	Data di pubblicazione
	
				2004
			
	Rivista in ANCE
	
				IEEE TRANSACTIONS ON INFORMATION THEORY
			
	DOI
	
				https://dx.doi.org/10.1109/TIT.2004.833339
			
	Tipologia
	
				Article (author)
			
	Appare nelle tipologie:
	
				01 - Articolo su periodico

File in questo prodotto:

Non ci sono file associati a questo prodotto.

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/141718

Citazioni

ND

376

273

social impact