Item request has been placed! ×
Item request cannot be made. ×
loading  Processing Request

A novel framework for generic Spark workload characterization and similar pattern recognition using machine learning

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • المؤلفون: Garralda-Barrio, Mariano; Eiras-Franco, Carlos; Bolón-Canedo, Verónica
  • نوع التسجيلة:
    Electronic Resource
  • الدخول الالكتروني :
    http://hdl.handle.net/2183/36284
    https://doi.org/10.1016/j.jpdc.2024.104881
    https://doi.org/10.1016/j.jpdc.2024.104881
    info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2019-109238GB-C22/ES/APRENDIZAJE AUTOMATICO ESCALABLE Y EXPLICABLE
  • معلومة اضافية
    • Publisher Information:
      Elsevier 2024-07
    • نبذة مختصرة :
      [Abstract]: Comprehensive workload characterization plays a pivotal role in comprehending Spark applications, as it enables the analysis of diverse aspects and behaviors. This understanding is indispensable for devising downstream tuning objectives, such as performance improvement. To address this pivotal issue, our work introduces a novel and scalable framework for generic Spark workload characterization, complemented by consistent geometric measurements. The presented approach aims to build robust workload descriptors by profiling only quantitative metrics at the application task-level, in a non-intrusive manner. We expand our framework for downstream workload pattern recognition by incorporating unsupervised machine learning techniques: clustering algorithms and feature selection. These techniques significantly improve the process of grouping similar workloads without relying on predefined labels. We effectively recognize 24 representative Spark workloads from diverse domains, including SQL, machine learning, web search, graph, and micro-benchmarks, available in HiBench. Our framework achieves a high accuracy F-Measure score of up to 90.9% and a Normalized Mutual Information of up to 94.5% in similar workload pattern recognition. These scores significantly outperform the results obtained in a comparative analysis with an established workload characterization approach in the literature.
    • الموضوع:
    • Availability:
      Open access content. Open access content
      http://creativecommons.org/licenses/by-nc/3.0/es
      info:eu-repo/semantics/openAccess
      Atribución-NoComercial 3.0 España
    • Note:
      http://hdl.handle.net/2183/36284
      10.1016/j.jpdc.2024.104881
      M. Garralda-Barrio, C. Eiras-Franco, and V. Bolón-Canedo, "A novel framework for generic Spark workload characterization and similar pattern recognition using machine learning", Journal of Parallel and Distributed Computing, Vol. 189, 104881, Jul. 2024, doi: 10.1016/j.jpdc.2024.104881
      English
    • Other Numbers:
      ESUDC oai:ruc.udc.es:2183/36284
      10.1016/j.jpdc.2024.104881
      1439650253
    • Contributing Source:
      UNIV DE A CORUNA
      From OAIster®, provided by the OCLC Cooperative.
    • الرقم المعرف:
      edsoai.on1439650253
HoldingsOnline