SURVEY OF RESILIENCE STRATEGIES IN CLOUD PLATFORMS: FROM FAULT DETECTION TO AUTO-RECOVERY
Main Article Content
Abstract
The increasing reliance on cloud computing in modern digital ecosystems has made resilience a critical attribute for sustaining service continuity and trust. This study explores resilience in cloud environments, examining its evolution from fault detection and fault tolerance to self-healing and adaptive recovery mechanisms. Key concepts such as reliability, availability, and fault tolerance are discussed alongside major sources of failures, including hardware, software, network, configuration errors, and security breaches. The survey highlights resilience strategies across multiple dimensions: cyber resilience frameworks that consolidate definitions and operational paradigms; certificateless auditing schemes that ensure secure and efficient data integrity in cloud storage; Byzantine fault tolerance methods applied to distributed systems; fault detection techniques leveraging prior knowledge and statistical models; middleware-based recovery in federated clouds; and machine learning–driven approaches that enhance fault detection through feature engineering. Collectively, these approaches reinforce the convergence of reliability, security, and adaptability in cloud infrastructures, underscoring resilience as a dynamic capability rather than a static safeguard. Emerging trends emphasize AI-driven resilience, multi-cloud interoperability, and automated decision-making, which are expected to shape next-generation cloud ecosystems. By synthesizing these strategies, the study offers a consolidated perspective on sustaining operational stability, scalability, and client satisfaction in increasingly complex and distributed computing environments.
Downloads
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Download Copyright