Item request has been placed! ×
Item request cannot be made. ×
loading  Processing Request

Taming data locality for task scheduling under memory constraint in runtime systems ; Ordonnancement sous contrainte mémoire en domptant la localité des données dans un modèle de programmation à base de tâches

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • معلومة اضافية
    • Contributors:
      STatic Optimizations, Runtime Methods (STORM); Laboratoire Bordelais de Recherche en Informatique (LaBRI); Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Inria Bordeaux - Sud-Ouest; Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria); Optimisation des ressources : modèles, algorithmes et ordonnancement (ROMA); Laboratoire de l'Informatique du Parallélisme (LIP); École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL); Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure de Lyon (ENS de Lyon)-Université Claude Bernard Lyon 1 (UCBL); Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Inria Lyon; Institut National de Recherche en Informatique et en Automatique (Inria); Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS); Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS); ANR-19-CE46-0009,SOLHARIS,Solveurs pour architectures hétérogènes utilisant des supports d'exécution, objectif scalabilité(2019)
    • بيانات النشر:
      HAL CCSD
      Elsevier
    • الموضوع:
      2023
    • Collection:
      HAL Lyon 1 (University Claude Bernard Lyon 1)
    • نبذة مختصرة :
      International audience ; A now-classical way of meeting the increasing demand for computing speed by HPC applications is the use of GPUs and/or otheraccelerators. Such accelerators have their own memory, which is usually quite limited, and are connected to the main memorythrough a bus with bounded bandwidth. Thus, particular care should be devoted to data locality in order to avoid unnecessary datamovements. Task-based runtime schedulers have emerged as a convenient and efficient way to use such heterogeneous platforms.When processing an application, the scheduler has the knowledge of all tasks available for processing on a GPU, as well astheir input data dependencies. Hence, it is possible to produce a tasks processing order aiming at reducing the total processingtime through three objectives: minimizing data transfers, overlapping transfers and computation and optimizing the eviction ofpreviously-loaded data. In this paper, we focus on how to schedule tasks that share some of their input data (but are otherwiseindependent) on a single GPU. We provide a formal model of the problem, exhibit an optimal eviction strategy, and show thatordering tasks to minimize data movement is NP-complete. We review and adapt existing ordering strategies to this problem,and propose a new one based on task aggregation. We prove that the underlying problem of this new strategy is NP-complete,and prove the reasonable complexity of our proposed heuristic. These strategies have been implemented in the StarPU runtimesystem. We present their performance on tasks from tiled 2D, 3D matrix products, Cholesky factorization, randomized task order,randomized data pairs from the 2D matrix product as well as a sparse matrix product. We introduce a visual way to understandthese performance and lower bounds on the number of data loads for the 2D and 3D matrix products. Our experiments demonstratethat using our new strategy together with the optimal eviction policy reduces the amount of data movement as well as the totalprocessing time.
    • Relation:
      hal-03623220; https://inria.hal.science/hal-03623220; https://inria.hal.science/hal-03623220v2/document; https://inria.hal.science/hal-03623220v2/file/fgcs-paper-reviewers-round2-unmarked-submitted.pdf
    • الرقم المعرف:
      10.1016/j.future.2023.01.024
    • Rights:
      info:eu-repo/semantics/OpenAccess
    • الرقم المعرف:
      edsbas.D4E3040B