Minimal-interval semantics [3] associates with each query over a document a set of intervals, called witnesses, that are incomparable with respect to inclusion (i.e., they form an antichain): witnesses define the minimal regions of the document satisfying the query. Minimalinterval semantics makes it easy to define and compute several sophisticated proximity operators, provides snippets for user presentation, and can be used to rank documents: thus, computing efficiently the antichains obtained by operations such as logic conjunction and disjunction is a basic issue. In this paper we provide the first algorithms for computing such operators that are linear in the number of intervals and logarithmic in the number of input antichains. The space used is linear in the number of antichains. Moreover, the algorithms are lazy - they do not assume random access to the input antichains. These properties make the usage of our algorithms feasible in large-scale web search engines.

Efficient lazy algorithms for minimal-interval semantics / Paolo Boldi, Sebastiano Vigna - In: String Processing and Information Retrieval : 13th International Conference, SPIRE 2006, Glasgow, UK, October 11-13, 2006 : Proceedings / [a cura di] Fabio Crestani, Paolo Ferragina, Mark Sanderson. - Berlin : Springer, 2006. - ISBN 3540457749. - pp. 134-149 (( Intervento presentato al 13th. convegno International Conference on String Processing and Information Retrieval tenutosi a Glasgow, UK nel 2006.

Efficient lazy algorithms for minimal-interval semantics

Paolo Boldi;Sebastiano Vigna
2006

Abstract

Minimal-interval semantics [3] associates with each query over a document a set of intervals, called witnesses, that are incomparable with respect to inclusion (i.e., they form an antichain): witnesses define the minimal regions of the document satisfying the query. Minimalinterval semantics makes it easy to define and compute several sophisticated proximity operators, provides snippets for user presentation, and can be used to rank documents: thus, computing efficiently the antichains obtained by operations such as logic conjunction and disjunction is a basic issue. In this paper we provide the first algorithms for computing such operators that are linear in the number of intervals and logarithmic in the number of input antichains. The space used is linear in the number of antichains. Moreover, the algorithms are lazy - they do not assume random access to the input antichains. These properties make the usage of our algorithms feasible in large-scale web search engines.
Settore INF/01 - Informatica
2006
Book Part (author)
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/22231
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 2
social impact