نبذة مختصرة : Notebooks are general-purpose programming platforms widely used in machine learning (ML), artificial intelligence (AI), data science, and data analytics across almost every science and engineering field. Despite supporting a wide diversity of disciplines, a dominating application of production Notebook workloads is interactive ML training (IMLT). To guarantee high interactivity, modern Notebook services typically allocate and reserve GPU resources for actively running Notebook sessions. These Notebook sessions are long-running but characterized by intermittent and sporadic GPU usage. Consequently, during most of their lifetimes, Notebook sessions do not use the reserved GPUs, resulting in extremely low GPU utilization and prohibitively high cost. This project aims to build a new Notebook platform solution for IMLT workloads to address these issues. The success of the project will provide an efficient and interactive Notebook platform that significantly reduces GPU resource wastage. The project will advance understanding in large-scale cluster computing systems and gain insights into achieving high carbon efficiency and sustainability of large-scale GPU computing infrastructure.
No Comments.