Item request has been placed! ×
Item request cannot be made. ×
loading  Processing Request

Neural-based test oracle generation: a large-scale evaluation and lessons learned

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • المؤلفون: Hossain, SB; Filieri, A; Dwyer, MB; Elbaum, S; Visser, W
  • المصدر:
    ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE) ; 132 ; 120
  • نوع التسجيلة:
    conference object
  • اللغة:
    unknown
  • معلومة اضافية
    • بيانات النشر:
      ACM
    • الموضوع:
      2023
    • Collection:
      Imperial College London: Spiral
    • الموضوع:
    • نبذة مختصرة :
      Defining test oracles is crucial and central to test development, but manual construction of oracles is expensive. While recent neural-based automated test oracle generation techniques have shown promise, their real-world effectiveness remains a compelling question requiring further exploration and understanding. This paper investigates the effectiveness of TOGA, a recently developed neural-based method for automatic test oracle generation. TOGA utilizes EvoSuite-generated test inputs and generates both exception and assertion oracles. In a Defects4j study, TOGA outperformed specification, search, and neural-based techniques, detecting 57 bugs, including 30 unique bugs not detected by other methods. To gain a deeper understanding of its applicability in real-world settings, we conducted a series of external, extended, and conceptual replication studies of TOGA. In a large-scale study involving 25 real-world Java systems, 23.5K test cases, and 51K injected faults, we evaluate TOGA’s ability to improve fault-detection effectiveness relative to the state-of-the-practice and the state-of-the-art. We find that TOGA misclassifies the type of oracle needed 24% of the time and that when it classifies correctly around 62% of the time it is not confident enough to generate any assertion oracle. When it does generate an assertion oracle, more than 47% of them are false positives, and the true positive assertions only increase fault detection by 0.3% relative to prior work. These findings expose limitations of the state-of-the-art neural-based oracle generation technique, provide valuable insights for improvement, and offer lessons for evaluating future automated oracle generation methods.
    • ISBN:
      979-84-00-70327-0
    • Relation:
      ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering; http://hdl.handle.net/10044/1/107374
    • الرقم المعرف:
      10.1145/3611643.3616265
    • الدخول الالكتروني :
      https://doi.org/10.1145/3611643.3616265
      http://hdl.handle.net/10044/1/107374
    • Rights:
      © 2023 Copyright held by the owner/author(s). This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) ; https://creativecommons.org/licenses/by/4.0/
    • الرقم المعرف:
      edsbas.51459CF5