Item request has been placed! ×
Item request cannot be made. ×
loading  Processing Request

Dynamic robustness evaluation for automated model selection in operation

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • معلومة اضافية
    • الموضوع:
      2025
    • Collection:
      Technical University of Denmark: DTU Orbit / Danmarks Tekniske Universitet
    • نبذة مختصرة :
      Context: The increasing use of artificial neural network (ANN) classifiers in systems, especially safety-critical systems (SCSs), requires ensuring their robustness against out-of-distribution (OOD) shifts in operation, which are changes in the underlying data distribution from the data training the classifier. However, measuring the robustness of classifiers in operation with only unlabeled data is challenging. Additionally, machine learning engineers may need to compare different models or versions of the same model and switch to an optimal version based on their robustness. Objective: This paper explores the problem of dynamic robustness evaluation for automated model selection. We aim to find efficient and effective metrics for evaluating and comparing the robustness of multiple ANN classifiers using unlabeled operational data. Methods: To quantitatively measure the differences between the model outputs and assess robustness under OOD shifts using unlabeled data, we choose distance-based metrics. An empirical comparison of five such metrics, suitable for higher-dimensional data like images, is performed. The selected metrics include Wasserstein distance (WD), maximum mean discrepancy (MMD), Hellinger distance (HL), Kolmogorov–Smirnov statistic (KS), and Kullback–Leibler divergence (KL), known for their efficacy in quantifying distribution differences. We evaluate these metrics on 20 state-of-the-art models (ten CIFAR10-based models, five CIFAR100-based models, and five ImageNet-based models) from a widely used robustness benchmark (RobustBench) using data perturbed with various types and magnitudes of corruptions to mimic real-world OOD shifts. Results: Our findings reveal that the WD metric outperforms others when ranking multiple ANN models for CIFAR10- and CIFAR100-based models, while the KS metric demonstrates superior performance for ImageNet-based models. MMD can be used as a reliable second option for both datasets. Conclusion: This study highlights the effectiveness of distance-based metrics in ...
    • File Description:
      application/pdf
    • الرقم المعرف:
      10.1016/j.infsof.2024.107603
    • الدخول الالكتروني :
      https://orbit.dtu.dk/en/publications/87324c21-0108-43e1-ba39-65349b10dab1
      https://doi.org/10.1016/j.infsof.2024.107603
      https://backend.orbit.dtu.dk/ws/files/378807468/1-s2.0-S0950584924002088-main.pdf
    • Rights:
      info:eu-repo/semantics/openAccess
    • الرقم المعرف:
      edsbas.FDD74FC5