Research Repository

Bubble budgeting: throughput optimization for dynamic workloads by exploiting dark cores in many core systems

Wang, X and Singh, AK and Li, B and Yang, Y and Li, H and Mak, T (2018) 'Bubble budgeting: throughput optimization for dynamic workloads by exploiting dark cores in many core systems.' IEEE Transactions on Computers, 67 (2). 178 - 192. ISSN 0018-9340

[img]
Preview
Text
BubbleBudgeting.pdf - Accepted Version

Download (3MB) | Preview

Abstract

All the cores of a many-core chip cannot be active at the same time, due to reasons like low CPU utilization in server systems and limited power budget in dark silicon era. These free cores (referred to as bubbles) can be placed near active cores for heat dissipation so that the active cores can run at a higher frequency level, boosting the performance of applications that run on active cores. Budgeting inactive cores (bubbles) to applications to boost performance has the following three challenges. First, the number of bubbles varies due to open workloads. Second, communication distance increases when a bubble is inserted between two communicating tasks (a task is a thread or process of a parallel application), leading to performance degradation. Third, budgeting too many bubbles as coolers to running applications leads to insufficient cores for future applications. In order to address these challenges, in this paper, a bubble budgeting scheme is proposed to budget free cores to each application so as to optimize the throughput of the whole system. Throughput of the system depends on the execution time of each application and the waiting time incurred for newly arrived applications. Essentially, the proposed algorithm determines the number and locations of bubbles to optimize the performance and waiting time of each application, followed by tasks of each application being mapped to a core region. A Rollout algorithm is used to budget power to the cores as the last step. Experiments show that our approach achieves 50 percent higher throughput when compared to state-of-the-art thermal-aware runtime task mapping approaches. The runtime overhead of the proposed algorithm is in the order of 1M cycles, making it an efficient runtime task management method for large-scale many-core systems.

Item Type: Article
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Faculty of Science and Health > Computer Science and Electronic Engineering, School of
Depositing User: Elements
Date Deposited: 06 Aug 2018 09:50
Last Modified: 06 Aug 2018 10:15
URI: http://repository.essex.ac.uk/id/eprint/21111

Actions (login required)

View Item View Item