نبذة مختصرة : Ph.D. ; Distributed data analytics processes and analyzes large, diverse and distributed data sets. Systems for distributed data analytics have become more and more important recently as they provide researchers and practitioners the ability to deal with the fast growing data. In this thesis, we identified several limitations in existing distributed data analytics systems in terms of expressiveness, efficiency and scalability, and proposed three novel systems with new programming and execution models. ; First, we identified the problem of lacking dynamic parallelism support in existing parameter server architecture for distributed machine learning. We proposed a new system, FlexPS, with a novel multi-stage abstraction to support flexible parallelism control. ; Second, we designed a new programming paradigm named MapUpdate, which enjoys the benefits from both the immutable and mutable abstractions for distributed data analytics. We implemented MapUpdate in a system called Tangram, which improves the state-of-the-art distributed data analytics systems with more expressiveness and higher efficiency. ; Third, we studied the problem of how distributed data analytics systems can scale to geographically distributed data centers each with t ens of thousands of machines more efficiently and economically. We designed and implemented a system running in production at Alibaba named Yugong, which reduces the cross-data-center bandwidth by tens of PBs per day. ; 分佈式數據分析是一門處理和分析分佈式大數據的技術。其中,分佈式數據分析系統近年來變得越來越重要,因為他們賦予了研究者與實踐者處理快速增長的大數據的能力。在本篇論文中,我們研究并總結了現有分佈式數據分析系統中關於表達能力、運行效率和可擴展性的一系列問題,并提出了三個具有新的編程模型和執行模型的創新型系統。 ; 首先,我們調研并發現了現有分佈式機器學習系統中缺乏對動態並行度支持的問題。我們提出了一個全新的系統FlexPS來解決這一問題。FlexPS可以通過一個新穎的多階段模型來支持靈活並行度調整。 ; 接著,我們設計了一個全新的編程模型MapUpdate。MapUpdate可以同時享受分佈式數據分析系統中可變的與非可變的抽象帶來的好處。我們將MapUpdate實現在系統Tangram中。Tangram提高了現有分佈式數據分析系統的表達能力與運行效率。 ; 最後,我們研究了如何將分佈式數據分析系統更高效和經濟地擴展到跨地域分佈的數據中心上。這些數據中心每個可以有上萬臺機器。我們設計和實現了一個在阿里巴巴生產環境中運行的系統Yugong。Yugong可以減少數十PB的跨數據中心的網絡帶寬。 ; Huang, Yuzhen. ; Thesis Ph.D. Chinese University of Hong Kong 2019. ...
No Comments.