Please use this identifier to cite or link to this item:
https://dspace.iiti.ac.in/handle/123456789/17842
| Title: | C-SIMD: CORDIC-Driven SIMD Processing Element for Resource-Efficient Multi-Precision Deep Learning Inference |
| Authors: | Trivedi, Vasundhara Vishvakarma, Santosh Kumar |
| Issue Date: | 2026 |
| Publisher: | Institute of Electrical and Electronics Engineers Inc. |
| Citation: | Trivedi, V., Raut, G., Mohammad, B. S., Vishvakarma, S. K., & Kumar, A. (2026). C-SIMD: CORDIC-Driven SIMD Processing Element for Resource-Efficient Multi-Precision Deep Learning Inference. IEEE Access. https://doi.org/10.1109/ACCESS.2026.3653253 |
| Abstract: | The growing demand for efficient deep learning inference on edge devices requires hardware that is both precision-adaptive and resource-efficient. This paper introduces C-SIMD, a CORDIC-driven, configurable SIMD Processing Element (PE) architecture for scalable, multi-precision MAC operations in DNN accelerators. C-SIMD supports dynamic operand precision (4/8/16/32-bit) and enables symmetric and asymmetric computation modes, covering integer and fixed-point arithmetic. By leveraging partial product computation with pipelined 8-bit CORDIC-based approximate multipliers, the architecture scales efficiently to higher precision while achieving notable area and power savings. A configurable pipeline offers tunable trade-offs between accuracy and complexity, making C-SIMD suitable for resource-constrained inference. Strategic reuse of the adder in the accumulation path enhances throughput and optimizes resource utilization. Unlike prior designs, C-SIMD fully exploits available resources and supports configurations such as 16 parallel 8×8-bit, 4 parallel 16×16-bit, single 32×32-bit, and asymmetric 32×8-bit MACs. Hardware evaluation demonstrates up to 14.29% area savings and as much as 16.17× throughput improvement. The proposed C-SIMD_Low (4/8/16) achieves 7.04 GOP/s, while C-SIMD_High (8/16/32) attains 4.16 GOP/s, delivering a 4× performance-efficiency gain over prior MAC architectures. Inference tests indicate minimal accuracy loss - below 1% on MNIST-LeNet, under 2.9% on CIFAR-10-AlexNet, and less than 2.2% on CIFAR-10-VGG16 compared to float32 baselines - demonstrating its potential for high-throughput, energy-efficient Edge-AI systems. © 2013 IEEE. |
| URI: | https://dx.doi.org/10.1109/ACCESS.2026.3653253 https://dspace.iiti.ac.in:8080/jspui/handle/123456789/17842 |
| Type of Material: | Journal Article |
| Appears in Collections: | Department of Electrical Engineering |
Files in This Item:
There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
Altmetric Badge: