Web data extraction is concerned, among other things, with routine data accessing and downloading from continuously-updated dynamic Web pages. There is a relevant trade-off between the rate at which the external Web sites are accessed and the computational burden on the accessing client. We address the problem by proposing a predictive model, typical of the Operating Systems literature, of the rate-of-update of each Web source. The presented model has been implemented into a new version of the Dynamo project: a middleware that assists in generating informative RSS feeds out of traditional HTML Web sites. To be effective, i.e., make RSS feeds be timely and informative and to be scalable, Dynamo needs a careful tuning and customization of its polling policies, which are described in detail.

Adaptive Web Data Extraction Policies / G. Fiumara, M. Marchi, A. Provetti. - In: ATTI DELLA ACCADEMIA PELORITANA DEI PERICOLANTI, CLASSE DI SCIENZE FISICHE MATEMATICHE E NATURALI. - ISSN 0365-0359. - 86:2(2008 Apr). [10.1478/C1A0802011]

Adaptive Web Data Extraction Policies

M. Marchi;A. Provetti
Ultimo
2008

Abstract

Web data extraction is concerned, among other things, with routine data accessing and downloading from continuously-updated dynamic Web pages. There is a relevant trade-off between the rate at which the external Web sites are accessed and the computational burden on the accessing client. We address the problem by proposing a predictive model, typical of the Operating Systems literature, of the rate-of-update of each Web source. The presented model has been implemented into a new version of the Dynamo project: a middleware that assists in generating informative RSS feeds out of traditional HTML Web sites. To be effective, i.e., make RSS feeds be timely and informative and to be scalable, Dynamo needs a careful tuning and customization of its polling policies, which are described in detail.
Settore INF/01 - Informatica
apr-2008
29-nov-2007
https://cab.unime.it/journals/index.php/AAPP/issue/view/Vol86_Issue2
Article (author)
File in questo prodotto:
File Dimensione Formato  
374-721-1-PB.pdf

accesso aperto

Tipologia: Publisher's version/PDF
Dimensione 215.24 kB
Formato Adobe PDF
215.24 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/908172
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact