Large inverted indices are by now common in the construction of web-scale search engines. For faster access, inverted indices are indexed internally so that it is possible to skip quickly over unnecessary documents. The classical approach to skipping dictates that a skip should be positioned every f1/2 document pointers, where f is the overall number of documents where the term appears. We argue that due to the growing size of the web more refined techniques are necessary, and describe how to embed a compressed perfect skip list in an inverted list. We provide statistical models that explain the empirical distribution of the skip data we observe in our experiments, and use them to devise good compression techniques that allow us to limit the waste in space, so that the resulting data structure increases the overall index size by just a few percents, still making it possible to index pointers with a rather fine granularity.
Compressed perfect embedded skip lists for quick inverted-index lookups / Paolo Boldi, Sebastiano Vigna - In: String Processing and Information Retrieval : 12th International Conference, SPIRE 2005, Buenos Aires, Argentina, November 2-4, 2005 : Proceedings / Mariano Consens, Gonzalo Navarro. - Berlin : Springer Verlag, 2005. - ISBN 3540297405. - pp. 25-28 (( Intervento presentato al 12. convegno SPIRE 2005 tenutosi a Buenos Aires nel 2005.
|Titolo:||Compressed perfect embedded skip lists for quick inverted-index lookups|
|Settore Scientifico Disciplinare:||Settore INF/01 - Informatica|
|Data di pubblicazione:||2005|
|Digital Object Identifier (DOI):||http://dx.doi.org/10.1007/11575832_3|
|Tipologia:||Book Part (author)|
|Appare nelle tipologie:||03 - Contributo in volume|