A selective sampling algorithm is a learning algorithm for classification that, based on the past observed data, decides whether to sample the label of each new instance to be classified. In this paper we introduce a general technique for turning linear-threshold classification algorithms from the general additive family into randomized selective sampling algorithms. For the most popular algorithms in this family we derive mistake bounds that hold for individual sequences of examples. These bounds show that our semi-supervised algorithms can achieve, on average, the same accuracy as that of their fully supervised counterparts, but using much fewer labels. Our theoretical results are corroborated by a number of experiments on real-world textual data. The outcome of these experiments is essentially predicted by our theoretical results: Our selective sampling algorithms tend to perform as good as the algorithms receiving the true label after each classification, while observing in practice substantially fewer labels.

Worst-case analysis of selective sampling for linear-threshold algorithms / N. Cesa-Bianchi, C. Gentile, L. Zaniboni - In: Advances in Neural Information Processing Systems 17 / Lawrence K. Saul, Yair Weiss, Léon Bottou. - Cambridge, MA (USA) : MIT Press, 2005. - ISBN 0262195348. - pp. 241-248 (( Intervento presentato al 17. convegno Neural Information Processing Systems Conference tenutosi a Vancouver , Canada nel 2004.

Worst-case analysis of selective sampling for linear-threshold algorithms

N. Cesa-Bianchi;L. Zaniboni
2005

Abstract

A selective sampling algorithm is a learning algorithm for classification that, based on the past observed data, decides whether to sample the label of each new instance to be classified. In this paper we introduce a general technique for turning linear-threshold classification algorithms from the general additive family into randomized selective sampling algorithms. For the most popular algorithms in this family we derive mistake bounds that hold for individual sequences of examples. These bounds show that our semi-supervised algorithms can achieve, on average, the same accuracy as that of their fully supervised counterparts, but using much fewer labels. Our theoretical results are corroborated by a number of experiments on real-world textual data. The outcome of these experiments is essentially predicted by our theoretical results: Our selective sampling algorithms tend to perform as good as the algorithms receiving the true label after each classification, while observing in practice substantially fewer labels.
selective sampling ; semi-supervised learning ; on-line learning ; kernel algorithms ; linear-threshold classifiers
Settore INF/01 - Informatica
2005
Book Part (author)
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/9892
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact