نبذة مختصرة : Online Failure Prediction (OFP) is a technique that attempts to predict incoming failures to mitigate their consequences. Machine Learning (ML) has been successfully used to create predictive models for OFP, but failures are rare, and thus failure data are typically not available. Fault injection has been accepted as a viable alternative to generate failure data. However, this raises several challenges, such as how to process the data to create and assess predictive models. The charac teristics of fault injection campaigns (e.g., repeated/controlled experiments) and OFP (e.g., autocorrelation) require specific considerations. This work proposes a six-stage methodology for using failure data generated through fault injection to create accurate/representative models for OFP. It overviews the various phases, from generating and processing the data, to creating and assessing the performance of the models up to their deployment, while considering the intrinsic characteristics of the problem. As a case study, we apply the methodology to develop failure predictors for the Linux Operating System (OS). Results show that the proposed methodology led to accurate predictive models that could also generalize to failures that occur under different execution profiles, whilst using traditional techniques resulted in over-optimistic observations. ; This work has been partially supported by Project “Agenda Mobilizadora Sines Nexus”. ref. No. 7113), supported by the Recovery and Resilience Plan (PRR) and by the European Funds Next Generation EU, following Notice No. 02/C05-i01/2022, Component 5 - Capitalization and Business Innovation - Mobilizing Agendas for Business Innovation and by the FCT, I.P./MCTES through national funds (PIDDAC), within the scope of CISUC R&D Unit – UIDB/00326/2020 or project code UIDP/00326/2020.
No Comments.