C-SIMD: CORDIC-Driven SIMD Processing Element for Resource-Efficient Multi-Precision Deep Learning Inference

Trivedi, Vasundhara; Vishvakarma, Santosh Kumar

Please use this identifier to cite or link to this item: https://dspace.iiti.ac.in/handle/123456789/17842

Full metadata record

DC Field	Value	Language
dc.contributor.author	Trivedi, Vasundhara	en_US
dc.contributor.author	Vishvakarma, Santosh Kumar	en_US
dc.date.accessioned	2026-02-10T15:50:13Z	-
dc.date.available	2026-02-10T15:50:13Z	-
dc.date.issued	2026	-
dc.identifier.citation	Trivedi, V., Raut, G., Mohammad, B. S., Vishvakarma, S. K., & Kumar, A. (2026). C-SIMD: CORDIC-Driven SIMD Processing Element for Resource-Efficient Multi-Precision Deep Learning Inference. IEEE Access. https://doi.org/10.1109/ACCESS.2026.3653253	en_US
dc.identifier.other	EID(2-s2.0-105027538570)	-
dc.identifier.uri	https://dx.doi.org/10.1109/ACCESS.2026.3653253	-
dc.identifier.uri	https://dspace.iiti.ac.in/handle/123456789/17842	-
dc.description.abstract	The growing demand for efficient deep learning inference on edge devices requires hardware that is both precision-adaptive and resource-efficient. This paper introduces C-SIMD, a CORDIC-driven, configurable SIMD Processing Element (PE) architecture for scalable, multi-precision MAC operations in DNN accelerators. C-SIMD supports dynamic operand precision (4/8/16/32-bit) and enables symmetric and asymmetric computation modes, covering integer and fixed-point arithmetic. By leveraging partial product computation with pipelined 8-bit CORDIC-based approximate multipliers, the architecture scales efficiently to higher precision while achieving notable area and power savings. A configurable pipeline offers tunable trade-offs between accuracy and complexity, making C-SIMD suitable for resource-constrained inference. Strategic reuse of the adder in the accumulation path enhances throughput and optimizes resource utilization. Unlike prior designs, C-SIMD fully exploits available resources and supports configurations such as 16 parallel 8×8-bit, 4 parallel 16×16-bit, single 32×32-bit, and asymmetric 32×8-bit MACs. Hardware evaluation demonstrates up to 14.29% area savings and as much as 16.17× throughput improvement. The proposed C-SIMD_Low (4/8/16) achieves 7.04 GOP/s, while C-SIMD_High (8/16/32) attains 4.16 GOP/s, delivering a 4× performance-efficiency gain over prior MAC architectures. Inference tests indicate minimal accuracy loss - below 1% on MNIST-LeNet, under 2.9% on CIFAR-10-AlexNet, and less than 2.2% on CIFAR-10-VGG16 compared to float32 baselines - demonstrating its potential for high-throughput, energy-efficient Edge-AI systems. © 2013 IEEE.	en_US
dc.language.iso	en	en_US
dc.publisher	Institute of Electrical and Electronics Engineers Inc.	en_US
dc.source	IEEE Access	en_US
dc.title	C-SIMD: CORDIC-Driven SIMD Processing Element for Resource-Efficient Multi-Precision Deep Learning Inference	en_US
dc.type	Journal Article	en_US
dc.rights.license	All Open Access	-
dc.rights.license	Gold Open Access	-
Appears in Collections:	Department of Electrical Engineering

Files in This Item:

There are no files associated with this item.

Show simple item record

Altmetric Badge: