Item request has been placed! ×
Item request cannot be made. ×
loading  Processing Request

On the Use of Statistical Machine Translation for Suggesting Variable Names for Decompiled Code: The Pharo Case

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • معلومة اضافية
    • Contributors:
      Pontificia Universidad Católica de Chile (UC); Universidad Católica Boliviana (UCB); Reflective Evolution of Ever-running Software Systems (EVREF); Inria Lille - Nord Europe; Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lille-Berger-Levrault-Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL); Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)
    • بيانات النشر:
      HAL CCSD
      Elsevier
    • الموضوع:
      2024
    • Collection:
      LillOA (HAL Lille Open Archive, Université de Lille)
    • نبذة مختصرة :
      International audience ; Adequately selecting variable names is a difficult activity for practitioners. In 2018, Jaffe et al. proposed the use of statistical machine translation (SMT) to suggest descriptive variable names for decompiled code. A large corpus of decompiled C code was used to train the SMT model. Our paper presents the results of a partial replication of Jaffe’s experiment. We apply the same technique and methodology to a dataset made of code written in the Pharo programming language. We selected Pharo since its syntax is simple - it fits on half of a postcard - and because the optimizations performed by the compiler are limited to method scope. Our results indicate that SMT may recover between 8.9% and 69.88% of the variable names depending on the training set. Our replication concludes that: (i) the accuracy depends on the code similarity between the training and testing sets; (ii) the simplicity of the Pharo syntax and the satisfactory decompiled code alignment have a positive impact on predicting variable names; and (iii) a relatively small code corpus is sufficient to train the SMT model, which shows the applicability of the approach to less popular programming languages. Additionally, to assess SMT’s potential in improving original variable names, ten Pharo developers reviewed 400 SMT name suggestions, with four reviews per variable. Only 15 suggestions (3.75%) were unanimously viewed as improvements, while 45 (11.25%) were perceived as improvements by at least two reviewers, highlighting SMT’s limitations in providing suitable alternatives.
    • Relation:
      hal-04564690; https://inria.hal.science/hal-04564690; https://inria.hal.science/hal-04564690/document; https://inria.hal.science/hal-04564690/file/Sando24a-COLA_SMT_2023_Preprint.pdf
    • الدخول الالكتروني :
      https://inria.hal.science/hal-04564690
      https://inria.hal.science/hal-04564690/document
      https://inria.hal.science/hal-04564690/file/Sando24a-COLA_SMT_2023_Preprint.pdf
    • Rights:
      http://creativecommons.org/licenses/by/ ; info:eu-repo/semantics/OpenAccess
    • الرقم المعرف:
      edsbas.603B71D6