Item request has been placed! ×
Item request cannot be made. ×
loading  Processing Request

Explaining Microservices' Cascading Failures From Their Logs

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • معلومة اضافية
    • بيانات النشر:
      Wiley
    • الموضوع:
      2024
    • Collection:
      Wiley Online Library (Open Access Articles via Crossref)
    • نبذة مختصرة :
      Context Identifying the possible root causes of observed failures is crucial in microservice applications, as much as explaining how such possible root failures propagated across the microservices forming an application. This can indeed help pick countermeasures avoiding observed failures to happen again, e.g., by introducing circuit breakers or bulkheads avoiding the root failures to propagate and cause those observed. Objective This paper aims at enabling to explain observed failures in microservice applications, either searching for all possible cascading failures or focusing only on those starting in a known root cause. Method We propose a log‐based root cause analysis technique, which declaratively determines the cascading failures that possibly caused an observed failure. We also enable exploiting our proposed technique in practice, by introducing a logging methodology to instrument applications to log their failures and service interactions, and by enabling to analyse such logs through yRCA, a prototype implementation of our proposed root cause analysis technique. Results The practical usability of our proposed technique is assessed by means of a case study and controlled experiments. The case study shows the low effort for instrumenting a third‐party application to produce the logs needed by our technique and its effectiveness in explaining injected failures. The controlled experiments further assess our technique's effectiveness and performances in explaining failures obtained with an existing chaos testbed. Conclusion Our proposed technique can help to identify the cascading failures that possibly caused an observed failure in a microservice application. It can be used to determine all possible cascading failures, or to explain how cascading failures propagated from a known root cause (e.g., identified with some other existing root cause analyser).
    • الرقم المعرف:
      10.1002/spe.3400
    • الدخول الالكتروني :
      https://doi.org/10.1002/spe.3400
      https://onlinelibrary.wiley.com/doi/pdf/10.1002/spe.3400
    • Rights:
      http://creativecommons.org/licenses/by-nc-nd/4.0/
    • الرقم المعرف:
      edsbas.9EB779B3