IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

A selective sampling algorithm is a learning algorithm for classification that, based on the past observed data, decides whether to ask the label of each new instance to be classified. In this paper, we introduce a general technique for turning linear-threshold classification algorithms from the general additive family into randomized selective sampling algorithms. For the most popular algorithms in this family we derive mistake bounds that hold for individual sequences of examples. These bounds show that our semi-supervised algorithms can achieve, on average, the same accuracy as that of their fully supervised counterparts, but using fewer labels. Our theoretical results are corroborated by a number of experiments on real-world textual data. The outcome of these experiments is essentially predicted by our theoretical results: Our selective sampling algorithms tend to perform as well as the algorithms receiving the true label after each classification, while observing in practice substantially fewer labels.

Worst-Case Analysis of Selective Sampling for Linear Classification / N. Cesa-Bianchi, C. Gentile, L. Zaniboni. - In: JOURNAL OF MACHINE LEARNING RESEARCH. - ISSN 1532-4435. - 7(2006), pp. 1205-1230.

Worst-Case Analysis of Selective Sampling for Linear Classification

N. Cesa-Bianchi;C. Gentile;L. Zaniboni

2006

Abstract

A selective sampling algorithm is a learning algorithm for classification that, based on the past observed data, decides whether to ask the label of each new instance to be classified. In this paper, we introduce a general technique for turning linear-threshold classification algorithms from the general additive family into randomized selective sampling algorithms. For the most popular algorithms in this family we derive mistake bounds that hold for individual sequences of examples. These bounds show that our semi-supervised algorithms can achieve, on average, the same accuracy as that of their fully supervised counterparts, but using fewer labels. Our theoretical results are corroborated by a number of experiments on real-world textual data. The outcome of these experiments is essentially predicted by our theoretical results: Our selective sampling algorithms tend to perform as well as the algorithms receiving the true label after each classification, while observing in practice substantially fewer labels.

Scheda breve

Scheda completa

Scheda completa (DC)

	Parole chiave
	
				Kernel algorithms; Linear-threshold classifiers; On-line learning; Selective sampling; Semi-supervised learning
			
	Settori scientifico-disciplinari dell'articolo (sola visualizzazione)
	
				Settore INF/01 - Informatica
			
	Data di pubblicazione
	
				2006
			
	Rivista in ANCE
	
				JOURNAL OF MACHINE LEARNING RESEARCH
			
	URL
	
				http://jmlr.csail.mit.edu/papers/v7/cesa-bianchi06b.html
			
	Tipologia
	
				Article (author)
			
	Appare nelle tipologie:
	
				01 - Articolo su periodico

File in questo prodotto:

Non ci sono file associati a questo prodotto.

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/24242

Citazioni

ND

106

67

social impact