IRIS Institutional Research Information System - AIR Archivio Istituzionale della Ricerca

Large inverted indices are by now common in the construction of web-scale search engines. For faster access, inverted indices are indexed internally so that it is possible to skip quickly over unnecessary documents. The classical approach to skipping dictates that a skip should be positioned every f1/2 document pointers, where f is the overall number of documents where the term appears. We argue that due to the growing size of the web more refined techniques are necessary, and describe how to embed a compressed perfect skip list in an inverted list. We provide statistical models that explain the empirical distribution of the skip data we observe in our experiments, and use them to devise good compression techniques that allow us to limit the waste in space, so that the resulting data structure increases the overall index size by just a few percents, still making it possible to index pointers with a rather fine granularity.

Compressed perfect embedded skip lists for quick inverted-index lookups / Paolo Boldi, Sebastiano Vigna - In: String Processing and Information Retrieval : 12th International Conference, SPIRE 2005, Buenos Aires, Argentina, November 2-4, 2005 : Proceedings / Mariano Consens, Gonzalo Navarro. - Berlin : Springer Verlag, 2005. - ISBN 3540297405. - pp. 25-28 (( Intervento presentato al 12. convegno SPIRE 2005 tenutosi a Buenos Aires nel 2005 [10.1007/11575832_3].

Compressed perfect embedded skip lists for quick inverted-index lookups

P. Boldi;S. Vigna

2005

Abstract

Large inverted indices are by now common in the construction of web-scale search engines. For faster access, inverted indices are indexed internally so that it is possible to skip quickly over unnecessary documents. The classical approach to skipping dictates that a skip should be positioned every f1/2 document pointers, where f is the overall number of documents where the term appears. We argue that due to the growing size of the web more refined techniques are necessary, and describe how to embed a compressed perfect skip list in an inverted list. We provide statistical models that explain the empirical distribution of the skip data we observe in our experiments, and use them to devise good compression techniques that allow us to limit the waste in space, so that the resulting data structure increases the overall index size by just a few percents, still making it possible to index pointers with a rather fine granularity.

Scheda breve

Scheda completa

Scheda completa (DC)

	Settori scientifico-disciplinari del contributo
	
			Settore INF/01 - Informatica
		
	Data di pubblicazione
	
			2005
		
	DOI
	
			https://dx.doi.org/10.1007/11575832_3
		
	Tipologia
	
			Book Part (author)
		
	Appare nelle tipologie:
	
			03 - Contributo in volume

File in questo prodotto:

Non ci sono file associati a questo prodotto.

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/4698

Citazioni

ND

20

13

social impact