Please use this identifier to cite or link to this item:
https://dspace.iiti.ac.in/handle/123456789/13897
Title: | A Precision-Aware Neuron Engine for DNN Accelerators |
Authors: | Raut, Gopal Jaiswal, Sonu Vishvakarma, Santosh Kumar |
Keywords: | Activation function;Approximate computing;Deep neural networks;Edge-AI;Multiply-accumulate unit;Neuron engine;Precision-aware architecture |
Issue Date: | 2024 |
Publisher: | Springer |
Citation: | Vishwakarma, S., Raut, G., Jaiswal, S., Vishvakarma, S. K., & Ghai, D. (2024). A Precision-Aware Neuron Engine for DNN Accelerators. SN Computer Science. Scopus. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85191734640&doi=10.1007%2fs42979-024-02851-z&partnerID=40&md5=b9e885f2adfa25e4445331012003fd65 |
Abstract: | Deep Neural Networks (DNNs) form the backbone of contemporary deep learning, powering various artificial intelligence (AI) applications. However, their computational demands, primarily stemming from the resource-intensive Neuron Engine (NE), present a critical challenge. This NE comprises of Multiply-and-Accumulate (MAC) and Activation Function (AF) operations, contributing significantly to the overall computational overhead. To address these challenges, we propose a groundbreaking Precision-aware Neuron Engine (PNE) architecture, introducing a novel approach to low-bit and high-bit precision computations with minimal resource utilization. The PNE’s MAC unit stands out for its innovative pre-loading of the accumulator register with a bias value, eliminating the need for additional components like an extra adder, multiplexer, and bias register. This design achieves significant resource savings, with an 8-bit signed fixed-point implementation demonstrating notable reductions in resource utilization, critical delay, and power-delay product compared to conventional architectures. An 8-bit sfixed < N, q > implementation of the MAC in the PNE shows 29.23% savings in resource utilization and 32.91% savings in critical delay compared with IEEE architecture, and 24.91% savings in PDP (power-delay product) compared with booth architecture. Our comprehensive evaluation showcases the PNE’s efficacy in maintaining inferential accuracy across quantized and unquantized models. The proposed design not only achieves precision-awareness with a minimal increase (≈ 10%) in resource overhead, but also achieves a remarkable 34.61% increase in throughput and reduction in critical delay (34.37% faster than conventional design), highlighting its efficiency gains and superior performance in PNE computations. Software emulator shows minimal accuracy losses ranging from 0.6% to 1.6%, the PNE proves its versatility across different precisions and datasets, including MNIST (on LeNet) and ImageNet (on CaffeNet). The flexibility and configurability of the PNE make it a promising solution for precision-aware neuron processing, particularly in edge AI applications with stringent hardware constraints. This research contributes a pivotal advancement towards enhancing the efficiency of DNN computations through precision-aware architecture, paving the way for more resource-efficient and high-performance AI systems. © The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd. 2024. |
URI: | https://doi.org/10.1007/s42979-024-02851-z https://dspace.iiti.ac.in/handle/123456789/13897 |
ISSN: | 2662-995X |
Type of Material: | Journal Article |
Appears in Collections: | Department of Electrical Engineering |
Files in This Item:
There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
Altmetric Badge: