نبذة مختصرة : In parallel computing, obtaining maximal performance is often mandatory to solve large and complex problems. However, this is often hard to achieve due to multiple performance degradation factors. Among them, load imbalance, which describes the uneven distribution of work among processing elements, is one of the most critical. Load balancing techniques are used to distribute the workload evenly while minimizing the communications, to maximize performance. A key challenge is to know when to use load balancing techniques, and how to distribute the workload. The former is done through load balancing criteria, which trigger load balancing during application execution, while the latter depends on the partitioning algorithm used inside the load balancing algorithm. This thesis investigates the combined impact of the load balancing criteria and the partitioning algorithm on the application performance. In the first part of this work, we study when to use load balancing algorithms and extend the theory around it. In particular, we propose a mathematical model of the CPU time of load balanced parallel applications. We use this model to derive an optimal automatic load balancing criterion shown in our numerical experiments to increase the performance of our particles simulation application by up to 17.6% compared to state-of-the-art criteria. Then, we develop an algorithm that can retrieve in polynomial time the indexes of the optimal iterations at which the load balancer must be invoked to maximize the application performance. Until now, this task has never been done in less than an exponential time. In a second part, we address the lack of anticipation and adaptivity of classical partitioning algorithms using two approaches. The first anticipates the growth of the load imbalance by giving less work to the processing elements that are measured to overload. In our numerical study, we show that it can increase the performance of our parallel stochastic erosion simulator by up to 16%. The second approach analyzes the evolution of the computation to drive the division of the computational domain to reduce the migration of computational elements over time. As a result, we observe up to 76% performance improvement in our numerical test cases. Both of these concepts can reduce the number of times the load balancer has to be invoked and the second method also mitigates the growth of the load imbalance. Finally, we introduce a metric that effectively measures all aspects of load balancing: its cost, its imbalance correcting capability, and its impact on the load imbalance growth. Ideally, this metric could be used to rank load balancing algorithms and select them on-the-fly.
No Comments.