Data warehouses are increasingly storing and managing large scale string data, and dealing with large volume of transactions that update and search string data. Motivated by this context, we initiate the study of self-adjusting data structures for string dictionary operations, that is, data structures that are designed to be efficient on an entire sequence rather than individual string operations. Furthermore, we study this problem in the external memory model where string data is too massive to be stored in internal memory and has to reside in disks; each access to a disk page fetches B items, and the cost of the operations is the number of pages accessed (I/Os). We show that given a dictionary of n strings S-1,...,S-n of total length N, a sequence of m string searches S-i1,Si-2,...,S-im takes O(Sigma(j=1)(m)((B)/(\Sij\)) + Sigma(i=1)(n)(n(i)log(Bni)/(m))) expected I/Os, where n(i) is the number of times S-i is queried. Inserting or deleting a string S takes O((\S\)(B) + log(B) n) expected amortized I/Os. The Sigma(j=1)(m) (\Sij\)(B) term is a lower bound for reading the input; the Sigma(i=1)(n) n(i) log(B) (ni)/(m) term is the entropy of the query sequence and is a standard information-theoretic lower bound. This is the Static Optimality Theorem for external-memory string access. The static optimality theorem was first formalized and proved by Tarjan and Sleator for numerical dictionary in their classic splay trees paper in 1985 [16]; they left open the B-tree case for numerical values (page 684), and a fortiori, the case of string data in an external-memory setting, that we settle here. We obtain our result not by using traditional "splay" operations on search trees as in [16], but by designing a novel and conceptually simple self-adjusting data structure based on the well-known skip lists.

Static optimality theorem for external memory string access / V. Ciriani, P. Ferragina, F. Luccio, S. Muthukrishnan - In: Foundations of Computer Science, 2002. Proceedings. The 43rd Annual IEEE Symposium onLos Alamitos : IEEE, 2002. - ISBN 0769518222. - pp. 219-227 (( Intervento presentato al 43. convegno Symposium on Foundations of Computer Science tenutosi a Vancouver nel 2002.

Static optimality theorem for external memory string access

V. Ciriani
Primo
;
2002

Abstract

Data warehouses are increasingly storing and managing large scale string data, and dealing with large volume of transactions that update and search string data. Motivated by this context, we initiate the study of self-adjusting data structures for string dictionary operations, that is, data structures that are designed to be efficient on an entire sequence rather than individual string operations. Furthermore, we study this problem in the external memory model where string data is too massive to be stored in internal memory and has to reside in disks; each access to a disk page fetches B items, and the cost of the operations is the number of pages accessed (I/Os). We show that given a dictionary of n strings S-1,...,S-n of total length N, a sequence of m string searches S-i1,Si-2,...,S-im takes O(Sigma(j=1)(m)((B)/(\Sij\)) + Sigma(i=1)(n)(n(i)log(Bni)/(m))) expected I/Os, where n(i) is the number of times S-i is queried. Inserting or deleting a string S takes O((\S\)(B) + log(B) n) expected amortized I/Os. The Sigma(j=1)(m) (\Sij\)(B) term is a lower bound for reading the input; the Sigma(i=1)(n) n(i) log(B) (ni)/(m) term is the entropy of the query sequence and is a standard information-theoretic lower bound. This is the Static Optimality Theorem for external-memory string access. The static optimality theorem was first formalized and proved by Tarjan and Sleator for numerical dictionary in their classic splay trees paper in 1985 [16]; they left open the B-tree case for numerical values (page 684), and a fortiori, the case of string data in an external-memory setting, that we settle here. We obtain our result not by using traditional "splay" operations on search trees as in [16], but by designing a novel and conceptually simple self-adjusting data structure based on the well-known skip lists.
Settore INF/01 - Informatica
2002
IEEE
Book Part (author)
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/63877
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 12
  • ???jsp.display-item.citation.isi??? 6
social impact