Tang, Jianhua and Chen, Fangfang and Li, Jiaping and Liu, Zilong (2024) Learn to Schedule: Data Freshness-oriented Intelligent Scheduling in Industrial IoT. IEEE Transactions on Cognitive Communications and Networking. p. 1. DOI https://doi.org/10.1109/tccn.2024.3445342 (In Press)
Tang, Jianhua and Chen, Fangfang and Li, Jiaping and Liu, Zilong (2024) Learn to Schedule: Data Freshness-oriented Intelligent Scheduling in Industrial IoT. IEEE Transactions on Cognitive Communications and Networking. p. 1. DOI https://doi.org/10.1109/tccn.2024.3445342 (In Press)
Tang, Jianhua and Chen, Fangfang and Li, Jiaping and Liu, Zilong (2024) Learn to Schedule: Data Freshness-oriented Intelligent Scheduling in Industrial IoT. IEEE Transactions on Cognitive Communications and Networking. p. 1. DOI https://doi.org/10.1109/tccn.2024.3445342 (In Press)
Abstract
In the context of the Industrial Internet of Things (IIoT), developing an accurate and timely scheduling policy is essential. Recently, the Age of Incorrect Information (AoII) is proposed for measuring the timeliness and accuracy of certain status information for monitoring/controlling purposes. In this work, we investigate a multi-sensor state updating system in which AoII is used for quantifying information freshness. We aim to find an optimal scheduling policy to minimize the system- wide cost under bandwidth constraint. We first model the source status updates monitored by sensors as Markov chains and the scheduling problem as a constrained Markov decision process (CMDP). It is challenging to solve the formulated CMDP problem by conventional methods, due to the heterogeneity of source status updates in IIoT and the bandwidth constraint. As such, a framework with the aid of deep reinforcement learning, i.e., Order-Preserving Quantization-Based Constrained Reinforcement Learning Algorithm with Historical Adjustment (OPQ-RL HA) is developed. Furthermore, by integrating it with the Asynchronous Advantage Actor-Critic (A3C) and the Deep Deterministic Policy Gradient (DDPG), two different algorithms are proposed, i.e., OPQ-A3C HA and OPQ-DDPG HA. With extensive numerical validation, it is demonstrated that the proposed algorithm has a lower average system-wide cost compared to the benchmark algorithms.
Item Type: | Article |
---|---|
Uncontrolled Keywords: | Age of Incorrect Information, Data Freshness, Deep Reinforcement Learning, Constrained Markov Decision Process, Industrial Internet of Things |
Divisions: | Faculty of Science and Health Faculty of Science and Health > Computer Science and Electronic Engineering, School of |
SWORD Depositor: | Unnamed user with email elements@essex.ac.uk |
Depositing User: | Unnamed user with email elements@essex.ac.uk |
Date Deposited: | 19 Aug 2024 11:00 |
Last Modified: | 31 Oct 2024 03:03 |
URI: | http://repository.essex.ac.uk/id/eprint/39005 |
Available files
Filename: Learn-to-Schedule-FINAL.pdf