Item request has been placed! ×
Item request cannot be made. ×
loading  Processing Request

Weak Labelling for File-level Source Code Classification

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • المؤلفون: Sas, Cezar; Capiluppi, Andrea
  • المصدر:
    Sas , C & Capiluppi , A 2023 , Weak Labelling for File-level Source Code Classification . in T Zhang , X Xia & N Novielli (eds) , Proceedings - 2023 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2023 . Proceedings - 2023 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2023 , Institute of Electrical and Electronics Engineers Inc. , pp. 698-702 , 30th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2023 , Macao , China , 21/03/2023 . https://doi.org/10.1109/SANER56733.2023.00074
  • الموضوع:
  • نوع التسجيلة:
    article in journal/newspaper
  • اللغة:
    English
  • معلومة اضافية
    • Contributors:
      Zhang, Tao; Xia, Xin; Novielli, Nicole
    • بيانات النشر:
      Institute of Electrical and Electronics Engineers Inc.
    • الموضوع:
      2023
    • Collection:
      University of Groningen research database
    • نبذة مختصرة :
      Software repository hosting services contain large amounts of open-source software, with GitHub hosting over 200 million repositories, from new to established ones. However, these repositories are not easy to find, calling for various attempts to classify their application domains automatically. However, most proposed approaches use artifacts, like README files, as a proxy for the project, losing the information in the source code and the interaction between files. Furthermore, they all focus on the project-level, ignoring the decomposition of software projects into components and modules.This work presents a weak labelling approach based on keyword extraction to annotate source files in a software project.Our findings suggest that using keywords to perform file-level annotations is an effective approach that can capture enough information from the source file so that new labels can be predicted.The long-term goal of our research is to classify source code files and use these annotations to identify semantic components in software projects. In addition, these annotations can be used for semantic reverse engineering, software reuse, and more. We plan to train machine learning models that use our proposed weak supervision to better annotate source files inside software projects.
    • File Description:
      application/pdf
    • ISBN:
      978-1-66545-278-6
      1-66545-278-1
    • Relation:
      urn:ISBN:9781665452786
    • الرقم المعرف:
      10.1109/SANER56733.2023.00074
    • الدخول الالكتروني :
      https://hdl.handle.net/11370/3a266d78-ef50-47e0-b3fd-ad61643e1fd1
      https://research.rug.nl/en/publications/3a266d78-ef50-47e0-b3fd-ad61643e1fd1
      https://doi.org/10.1109/SANER56733.2023.00074
      https://pure.rug.nl/ws/files/744460647/Weak_Labelling_for_File-level_Source_Code_Classification.pdf
      http://www.scopus.com/inward/record.url?scp=85160512827&partnerID=8YFLogxK
    • Rights:
      info:eu-repo/semantics/openAccess
    • الرقم المعرف:
      edsbas.B9E2448