Deanship of Graduate Studies | Researches | A PARALLEL HPC-BASED RESOURCE MANAGEMENT SYSTEM FOR BIG DATA APPLICATIONS

Main Page
Deanship
- The Dean
  - Dean's Word
  - Curriculum Vitae
  - Contact the Dean
- Vision and Mission
- Organizational Structure
- Vice- Deanship
- Vice- Dean
- KAU Graduate Studies
Research Services & Courses
- Research Services Unit
- Important Research for Society
- Deanship's Services
  - FAQs
  - Research
  - Staff Directory
  - Files
  - Favorite Websites
  - Deanship Access Map
Graduate Studies Awards
Deanship's Staff
- Staff Directory
Files
Researches
Contact us

- عربي
- English

Deanship of Graduate Studies

Document Details

Document Type	:	Thesis
Document Title	:	A PARALLEL HPC-BASED RESOURCE MANAGEMENT SYSTEM FOR BIG DATA APPLICATIONS نظام إدارة موارد متوازي قائم على الحوسبة عالية الأداء لتطبيقات البيانات الكبيرة
Subject	:	Faculty of Computing and Information Technology
Document Language	:	Arabic
Abstract	:	The amount of data produced in scientific and commercial fields is growing dramatically. Correspondingly, big data technologies, such as Hadoop and Spark, have emerged to tackle the challenges of collecting, processing, and storing such large-scale data. While the focus of big data applications is on handling enormous datasets, high-performance computing (HPC) focuses on performing computations as fast as possible. This is achieved by integrating heterogeneous hardware and crafting software and algorithms to exploit the parallelism provided by HPC. The performance capabilities afforded by HPC have made it an attractive environment for supporting scientific workflows and big data computing. This has led to a convergence of the HPC and big data fields. Unfortunately, there is usually a performance issue when running big data applications on HPC clusters because such applications are written in high-level programming languages. Such languages may be lacking in terms of performance and may not encourage or support writing highly parallel programs in contrast to some parallel programming models like Message Passing Interface (MPI). Furthermore, these platforms are designed as a distributed architecture, which differs from the architecture of HPC clusters. Alternately, the large volume of big data may hinder parallel programming models such as Message Passing Interface (MPI), Open Multi-Processing (OpenMP) and accelerator models (CUDA, OpenACC, OpenCL) from supporting high levels of parallelism. Based on the above-mentioned problems, there is a need to reduce the performance gap between HPC and big data applications while minimizing power consumption. To this end, this thesis puts forward the following research question: How can the performance of big data applications be enhanced on HPC clusters without sacrificing power consumption? A Hybrid Spark MPI OpenACC (HSMO) system is presented in this thesis as an answer to this question. HSMO relies on integrating Spark as a big data programming model with MPI and OpenACC as parallel programming models. Such integration brings together the advantages of each programming model and provides greater effectiveness. To enhance performance without sacrificing power consumption, the integration approach needs to exploit the hardware infrastructure in an intelligent manner. To do so, a mapping technique is proposed that is built based on the application’s virtual topology and the physical topology of the HPC resources. The presented approach in this thesis contributes to the domain of High-Performance Computing and Big Data and, more specifically, to resource management of HPC clusters, as well as to the areas of data locality and management of big data. The main contributions of this thesis include the novel integration and mapping approach itself, which supports big data applications on HPC clusters, the prototype implementation called HSMO, demonstrating the viability of the proposed approach, and a literature survey on relevant state-of-the-art research.
Supervisor	:	Prof. Maher Khemakhem
Thesis Type	:	Doctorate Thesis
Publishing Year	:	1441 AH 2019 AD
Co-Supervisor	:	Dr. Abdullah Basuhail
Added Date	:	Monday, December 9, 2019

Researchers

Researcher Name (Arabic)	Researcher Name (English)	Researcher Type	Dr Grade	Email
وليد عبدالله الشهري	Al Shehri, Waleed Abdullah	Researcher	Doctorate

Files

File Name	Type	Description
45657.pdf	pdf

Back To Researches Page