Research Repository

Inter-cluster Thread-to-core Mapping and DVFS on Heterogeneous Multi-cores

Reddy, BK and Singh, AK and Biswas, D and Merrett, GV and Al-Hashimi, BM (2018) 'Inter-cluster Thread-to-core Mapping and DVFS on Heterogeneous Multi-cores.' IEEE Transactions on Multi-Scale Computing Systems, 4 (3). pp. 369-382. ISSN 2332-7766

TMSCS - Camera-Ready - siteupload.pdf - Accepted Version

Download (1MB) | Preview


Heterogeneous multi-core platforms that contain different types of cores, organized as clusters, are emerging, e.g. ARM's big.LITTLE architecture. These platforms often need to deal with multiple applications, having different performance requirements, executing concurrently. This leads to generation of varying and mixed workloads (e.g. compute and memory intensive) due to resource sharing. Run-time management is required for adapting to such performance requirements and workload variabilities and to achieve energy efficiency. Moreover, the management becomes challenging when the applications are multi-threaded and the heterogeneity needs to be exploited. The existing run-time management approaches do not efficiently exploit cores situated in different clusters simultaneously (referred to as inter-cluster exploitation) and DVFS potential of cores, which is the aim of this paper. Such exploitation might help to satisfy the performance requirement while achieving energy savings at the same time. Therefore, in this paper, we propose a run-time management approach that first selects thread-to-core mapping based on the performance requirements and resource availability. Then, it applies online adaptation by adjusting the voltage-frequency (V-f) levels to achieve energy optimization, without trading-off application performance. For thread-to-core mapping, offline profiled results are used, which contain performance and energy characteristics of applications when executed on the heterogeneous platform by using different types of cores in various possible combinations. For an application, thread-to-core mapping process defines the number of used cores and their type, which are situated in different clusters. The online adaptation process classifies the inherent workload characteristics of concurrently executing applications, incurring a lower overhead than existing learning-based approaches as demonstrated in this paper. The classification of workload is performed using the metric Memory Reads Per Instruction (MRPI). The adaptation process pro-actively selects an appropriate V-f pair for a predicted workload. Subsequently, it monitors the workload prediction error and performance loss, quantified by instructions per second (IPS), and adjusts the chosen V-f to compensate. We validate the proposed run-time management approach on a hardware platform, the Odroid-XU3, with various combinations of multi-threaded applications from PARSEC and SPLASH benchmarks. Results show an average improvement in energy efficiency up to 33% compared to existing approaches while meeting the performance requirements.

Item Type: Article
Uncontrolled Keywords: Heterogeneous multi-cores; Multi-threaded applications; Run-time management; Performance; Energy consumption
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Faculty of Science and Health
Faculty of Science and Health > Computer Science and Electronic Engineering, School of
SWORD Depositor: Elements
Depositing User: Elements
Date Deposited: 19 Feb 2018 14:49
Last Modified: 15 Jan 2022 01:22

Actions (login required)

View Item View Item