Please use this identifier to cite or link to this item: https://dspace.iiti.ac.in/handle/123456789/17843
Full metadata record
DC FieldValueLanguage
dc.contributor.authorTrivedi, Vasundharaen_US
dc.contributor.authorBagga, Harman Singhen_US
dc.contributor.authorVishvakarma, Santosh Kumaren_US
dc.date.accessioned2026-02-10T15:50:13Z-
dc.date.available2026-02-10T15:50:13Z-
dc.date.issued2026-
dc.identifier.citationTrivedi, V., Raut, G., & Vishvakarma, S. K. (2026). Adaptive-precision SIMD architecture for high-throughput and resource-efficient DNN acceleration. Integration, 108. https://doi.org/10.1016/j.vlsi.2026.102666en_US
dc.identifier.issn0167-9260-
dc.identifier.otherEID(2-s2.0-105027733065)-
dc.identifier.urihttps://dx.doi.org/10.1016/j.vlsi.2026.102666-
dc.identifier.urihttps://dspace.iiti.ac.in:8080/jspui/handle/123456789/17843-
dc.description.abstractDeep Neural Network (DNN) accelerators require high computational throughput and flexible precision support while operating under stringent resource and power constraints. To address these challenges, we propose an adaptive-precision SIMD (Single Instruction, Multiple Data) Processing Element (PE) architecture for signed integer and fixed-point operations that maximizes resource utilization and enhances parallelism in multiply–accumulate (MAC) computations. The design introduces efficient resource reuse during partial product accumulation and supports both symmetric and asymmetric precision modes. Unlike state-of-the-art approaches, the proposed PE dynamically scales computation: processing 16 operands at low precision (4-bit), four operands at medium precision (8-bit), and a single operand at high precision (16-bit). Additionally, it supports asymmetric operations such as 16 × 4-bit multiplications in parallel, enabling unique flexibility and performance gains. The architecture is implemented and tested on ASIC and FPGA platforms. Accuracy evaluations across different DNN models and datasets show very small losses at reduced precision—less than 1% for LeNet on MNIST, 2.9% for AlexNet on CIFAR-10, 2.2% for VGG16 on CIFAR-10, and 3.5% for VGG16 on ImageNet-1000 compared to float32. Hardware synthesis yields significant improvements, including 46.2% fewer LUTs and 2.45 × less power on FPGA compared to existing designs. The proposed architecture delivers 2× higher throughput, upto 4.8× energy efficiency with 28.57% less area at 65 nm, compared to existing works, making it ideal for applications with variable precision and limited resources. © 2026 Elsevier B.V.en_US
dc.language.isoenen_US
dc.publisherElsevier B.V.en_US
dc.sourceIntegrationen_US
dc.titleAdaptive-precision SIMD architecture for high-throughput and resource-efficient DNN accelerationen_US
dc.typeJournal Articleen_US
Appears in Collections:Department of Electrical Engineering

Files in This Item:
There are no files associated with this item.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Altmetric Badge: