Document Details

Document Type : Thesis 
Document Title :
A PARALLEL HPC-BASED RESOURCE MANAGEMENT SYSTEM FOR BIG DATA APPLICATIONS
نظام إدارة موارد متوازي قائم على الحوسبة عالية الأداء لتطبيقات البيانات الكبيرة
 
Subject : Faculty of Computing and Information Technology 
Document Language : Arabic 
Abstract : The amount of data produced in scientific and commercial fields is growing dramatically. Correspondingly, big data technologies, such as Hadoop and Spark, have emerged to tackle the challenges of collecting, processing, and storing such large-scale data. While the focus of big data applications is on handling enormous datasets, high-performance computing (HPC) focuses on performing computations as fast as possible. This is achieved by integrating heterogeneous hardware and crafting software and algorithms to exploit the parallelism provided by HPC. The performance capabilities afforded by HPC have made it an attractive environment for supporting scientific workflows and big data computing. This has led to a convergence of the HPC and big data fields. Unfortunately, there is usually a performance issue when running big data applications on HPC clusters because such applications are written in high-level programming languages. Such languages may be lacking in terms of performance and may not encourage or support writing highly parallel programs in contrast to some parallel programming models like Message Passing Interface (MPI). Furthermore, these platforms are designed as a distributed architecture, which differs from the architecture of HPC clusters. Alternately, the large volume of big data may hinder parallel programming models such as Message Passing Interface (MPI), Open Multi-Processing (OpenMP) and accelerator models (CUDA, OpenACC, OpenCL) from supporting high levels of parallelism. Based on the above-mentioned problems, there is a need to reduce the performance gap between HPC and big data applications while minimizing power consumption. To this end, this thesis puts forward the following research question: How can the performance of big data applications be enhanced on HPC clusters without sacrificing power consumption? A Hybrid Spark MPI OpenACC (HSMO) system is presented in this thesis as an answer to this question. HSMO relies on integrating Spark as a big data programming model with MPI and OpenACC as parallel programming models. Such integration brings together the advantages of each programming model and provides greater effectiveness. To enhance performance without sacrificing power consumption, the integration approach needs to exploit the hardware infrastructure in an intelligent manner. To do so, a mapping technique is proposed that is built based on the application’s virtual topology and the physical topology of the HPC resources. The presented approach in this thesis contributes to the domain of High-Performance Computing and Big Data and, more specifically, to resource management of HPC clusters, as well as to the areas of data locality and management of big data. The main contributions of this thesis include the novel integration and mapping approach itself, which supports big data applications on HPC clusters, the prototype implementation called HSMO, demonstrating the viability of the proposed approach, and a literature survey on relevant state-of-the-art research. 
Supervisor : Prof. Maher Khemakhem 
Thesis Type : Doctorate Thesis 
Publishing Year : 1441 AH
2019 AD
 
Co-Supervisor : Dr. Abdullah Basuhail 
Added Date : Monday, December 9, 2019 

Researchers

Researcher Name (Arabic)Researcher Name (English)Researcher TypeDr GradeEmail
وليد عبدالله الشهريAl Shehri, Waleed AbdullahResearcherDoctorate 

Files

File NameTypeDescription
 45657.pdf pdf 

Back To Researches Page