C-SIMD: CORDIC-Driven SIMD Processing Element for Resource-Efficient Multi-Precision Deep Learning Inference

Trivedi, Vasundhara; Vishvakarma, Santosh Kumar

Please use this identifier to cite or link to this item: https://dspace.iiti.ac.in/handle/123456789/17842

Title:	C-SIMD: CORDIC-Driven SIMD Processing Element for Resource-Efficient Multi-Precision Deep Learning Inference
Authors:	Trivedi, Vasundhara Vishvakarma, Santosh Kumar
Issue Date:	2026
Publisher:	Institute of Electrical and Electronics Engineers Inc.
Citation:	Trivedi, V., Raut, G., Mohammad, B. S., Vishvakarma, S. K., & Kumar, A. (2026). C-SIMD: CORDIC-Driven SIMD Processing Element for Resource-Efficient Multi-Precision Deep Learning Inference. IEEE Access. https://doi.org/10.1109/ACCESS.2026.3653253
Abstract:	The growing demand for efficient deep learning inference on edge devices requires hardware that is both precision-adaptive and resource-efficient. This paper introduces C-SIMD, a CORDIC-driven, configurable SIMD Processing Element (PE) architecture for scalable, multi-precision MAC operations in DNN accelerators. C-SIMD supports dynamic operand precision (4/8/16/32-bit) and enables symmetric and asymmetric computation modes, covering integer and fixed-point arithmetic. By leveraging partial product computation with pipelined 8-bit CORDIC-based approximate multipliers, the architecture scales efficiently to higher precision while achieving notable area and power savings. A configurable pipeline offers tunable trade-offs between accuracy and complexity, making C-SIMD suitable for resource-constrained inference. Strategic reuse of the adder in the accumulation path enhances throughput and optimizes resource utilization. Unlike prior designs, C-SIMD fully exploits available resources and supports configurations such as 16 parallel 8×8-bit, 4 parallel 16×16-bit, single 32×32-bit, and asymmetric 32×8-bit MACs. Hardware evaluation demonstrates up to 14.29% area savings and as much as 16.17× throughput improvement. The proposed C-SIMD_Low (4/8/16) achieves 7.04 GOP/s, while C-SIMD_High (8/16/32) attains 4.16 GOP/s, delivering a 4× performance-efficiency gain over prior MAC architectures. Inference tests indicate minimal accuracy loss - below 1% on MNIST-LeNet, under 2.9% on CIFAR-10-AlexNet, and less than 2.2% on CIFAR-10-VGG16 compared to float32 baselines - demonstrating its potential for high-throughput, energy-efficient Edge-AI systems. © 2013 IEEE.
URI:	https://dx.doi.org/10.1109/ACCESS.2026.3653253 https://dspace.iiti.ac.in:8080/jspui/handle/123456789/17842
Type of Material:	Journal Article
Appears in Collections:	Department of Electrical Engineering

Files in This Item:

There are no files associated with this item.

Show full item record

Altmetric Badge: