Zhu, Xuqi and Zhang, Huaizhi and Lee, JunKyu and Zhu, Jiacheng and Pal, Chandrajit and Saha, Sangeet and McDonald-Maier, Klaus D and Zhai, Xiaojun (2026) Mitigating scalability challenges in LUT-based neural networks via pruning optimisations. IEEE transactions on computers. pp. 1-14. DOI https://doi.org/10.1109/tc.2026.3703350
Zhu, Xuqi and Zhang, Huaizhi and Lee, JunKyu and Zhu, Jiacheng and Pal, Chandrajit and Saha, Sangeet and McDonald-Maier, Klaus D and Zhai, Xiaojun (2026) Mitigating scalability challenges in LUT-based neural networks via pruning optimisations. IEEE transactions on computers. pp. 1-14. DOI https://doi.org/10.1109/tc.2026.3703350
Zhu, Xuqi and Zhang, Huaizhi and Lee, JunKyu and Zhu, Jiacheng and Pal, Chandrajit and Saha, Sangeet and McDonald-Maier, Klaus D and Zhai, Xiaojun (2026) Mitigating scalability challenges in LUT-based neural networks via pruning optimisations. IEEE transactions on computers. pp. 1-14. DOI https://doi.org/10.1109/tc.2026.3703350
Abstract
Modern deep neural networks heavily rely on a large number of multiply-accumulate operations, which constitute the predominant computational cost. To address this, Look-Up Table (LUT)-based matrix multiplications have emerged as a promising alternative for reducing the computational cost and time of the multiply-accumulate operations in a neural network. However, the LUT-based neural network still faces the scalability challenge due to the inherent limitations of LUT-based matrix multiplication. To mitigate these scalability limitations, this paper proposes a scalable and energy-efficient LUT-based approximate matrix multiplication unit (LUT-MU) constituting the basic component of the neural networks by integrating a pruning strategy on the MADDNESS algorithm, a LUT-based matrix multiplication methodology. With increasing problem size and precision demands in matrix multiplication, our proposed LUT-MU architecture effectively constrains resource expansion. The case study shows that deploying our LUT-MU in neural network architectures, including fully connected layers (MNIST) and ResNets (CIFAR-10, ImageNet)—on XCZU7EV and XCZU19EG FPGAs, produces up to 1.6× throughput improvement and 4.2× energy efficiency gains over mainstream CUDA-based network implementations, and 1.8× energy efficiency compared to leading quantised neural network implementations, with moderate impact on accuracy. Compared to original MADDNESS-based neural networks, our LUT-MU shows 1.3 to 2.6× resource savings based on various resolution configuration settings of MADDNESS.
| Item Type: | Article |
|---|---|
| Uncontrolled Keywords: | Hardware-software co-design; LUT-based matrix multiplications |
| Divisions: | Faculty of Science and Health > Computer Science and Electronic Engineering, School of |
| SWORD Depositor: | Unnamed user with email elements@essex.ac.uk |
| Depositing User: | Unnamed user with email elements@essex.ac.uk |
| Date Deposited: | 19 Jun 2026 13:30 |
| Last Modified: | 19 Jun 2026 13:30 |
| URI: | http://repository.essex.ac.uk/id/eprint/43416 |
Available files
Filename: TC2025Revision.pdf
Licence: Creative Commons: Attribution 4.0