The increasing demand for flexibility and scalability in dynamically obtaining and releasing computing resources in a cost-effective and device-independent manner, and easiness in hosting applications without the burden of installation and maintenance, has resulted in a wide adoption of the cloud computing paradigm. While the benefits are immense, this computing paradigm is still vulnerable to a large number of system failures; as a consequence, users have become increasingly concerned about the reliability and availability of cloud computing services. Fault tolerance and resilience serve as an effective means to address users’ reliability and availability concerns. In this chapter, we focus on characterizing the recurrent failures in a typical cloud computing environment, analyzing the effects of failures on users’ applications and surveying fault tolerance solutions corresponding to each class of failures. We also discuss the perspective of offering fault tolerance as a service to users’ applications as one of the effective means of addressing users’ reliability and availability concerns.

Fault tolerance and resilience in Cloud computing environments / R. Jhawar, V. Piuri - In: Computer and information security handbook / [a cura di] J.R. Vacca. - Riedizione. - [s.l] : Elsevier, 2013. - ISBN 9780123943972. - pp. 125-141 [10.1016/B978-0-12-394397-2.00007-6]

Fault tolerance and resilience in Cloud computing environments

R. Jhawar;V. Piuri
2013

Abstract

The increasing demand for flexibility and scalability in dynamically obtaining and releasing computing resources in a cost-effective and device-independent manner, and easiness in hosting applications without the burden of installation and maintenance, has resulted in a wide adoption of the cloud computing paradigm. While the benefits are immense, this computing paradigm is still vulnerable to a large number of system failures; as a consequence, users have become increasingly concerned about the reliability and availability of cloud computing services. Fault tolerance and resilience serve as an effective means to address users’ reliability and availability concerns. In this chapter, we focus on characterizing the recurrent failures in a typical cloud computing environment, analyzing the effects of failures on users’ applications and surveying fault tolerance solutions corresponding to each class of failures. We also discuss the perspective of offering fault tolerance as a service to users’ applications as one of the effective means of addressing users’ reliability and availability concerns.
cloud computing; fault model; architecture; servers; network; fault tolerance; crash failures; byzantine failures; checking; monitoring
Settore ING-INF/05 - Sistemi di Elaborazione delle Informazioni
2013
Book Part (author)
File in questo prodotto:
File Dimensione Formato  
cis_2013_cloud.pdf

accesso riservato

Tipologia: Pre-print (manoscritto inviato all'editore)
Dimensione 2.75 MB
Formato Adobe PDF
2.75 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/228421
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 35
  • ???jsp.display-item.citation.isi??? ND
social impact