Saber, Alireza and Hosseini, Mohammad-Mehdi and Fateh, Amirreza and Fateh, Mansoor and Abolghasemi, Vahid (2026) Lightweight Multi-Scale Framework for Human Pose and Action Classification. Sensors, 26 (4). p. 1102. DOI https://doi.org/10.3390/s26041102
Saber, Alireza and Hosseini, Mohammad-Mehdi and Fateh, Amirreza and Fateh, Mansoor and Abolghasemi, Vahid (2026) Lightweight Multi-Scale Framework for Human Pose and Action Classification. Sensors, 26 (4). p. 1102. DOI https://doi.org/10.3390/s26041102
Saber, Alireza and Hosseini, Mohammad-Mehdi and Fateh, Amirreza and Fateh, Mansoor and Abolghasemi, Vahid (2026) Lightweight Multi-Scale Framework for Human Pose and Action Classification. Sensors, 26 (4). p. 1102. DOI https://doi.org/10.3390/s26041102
Abstract
Human pose classification, along with related tasks such as action recognition, is a crucial area in deep learning due to its wide range of applications in assisting human activities. Despite significant progress, it remains a challenging problem because of high inter-class similarity, dataset noise, and the large variability in human poses. In this paper, we propose a lightweight yet highly effective modular attention-based architecture for human pose classification, built upon a Swin Transformer backbone for robust multi-scale feature extraction. The proposed design integrates the Spatial Attention module, the Context-Aware Channel Attention Module, and a novel Dual Weighted Cross Attention module, enabling effective fusion of spatial and channel-wise cues. Additionally, explainable AI techniques are employed to improve the reliability and interpretability of the model. We train and evaluate our approach on two distinct datasets: Yoga-82 (in both main-class and subclass configurations) and Stanford 40 Actions. Experimental results show that our model outperforms state-of-the-art baselines across accuracy, precision, recall, F1-score, and mean average precision, while maintaining an extremely low parameter count of only 0.79 million. Specifically, our method achieves accuracies of 90.40% and 87.44% for the 6-class and 20-class Yoga-82 configurations, respectively, and 94.28% for the Stanford 40 Actions dataset.
| Item Type: | Article |
|---|---|
| Uncontrolled Keywords: | lightweight; multi-scale; human pose; classification |
| Subjects: | Z Bibliography. Library Science. Information Resources > ZZ OA Fund (articles) |
| Divisions: | Faculty of Science and Health Faculty of Science and Health > Computer Science and Electronic Engineering, School of |
| SWORD Depositor: | Unnamed user with email elements@essex.ac.uk |
| Depositing User: | Unnamed user with email elements@essex.ac.uk |
| Date Deposited: | 18 Feb 2026 15:52 |
| Last Modified: | 18 Feb 2026 16:13 |
| URI: | http://repository.essex.ac.uk/id/eprint/42768 |
Available files
Filename: sensors-26-01102.pdf
Licence: Creative Commons: Attribution 4.0