A Precision-Aware Neuron Engine for DNN Accelerators

Raut, Gopal; Jaiswal, Sonu; Vishvakarma, Santosh Kumar

Please use this identifier to cite or link to this item: https://dspace.iiti.ac.in/handle/123456789/13897

Title:	A Precision-Aware Neuron Engine for DNN Accelerators
Authors:	Raut, Gopal Jaiswal, Sonu Vishvakarma, Santosh Kumar
Keywords:	Activation function;Approximate computing;Deep neural networks;Edge-AI;Multiply-accumulate unit;Neuron engine;Precision-aware architecture
Issue Date:	2024
Publisher:	Springer
Citation:	Vishwakarma, S., Raut, G., Jaiswal, S., Vishvakarma, S. K., & Ghai, D. (2024). A Precision-Aware Neuron Engine for DNN Accelerators. SN Computer Science. Scopus. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85191734640&doi=10.1007%2fs42979-024-02851-z&partnerID=40&md5=b9e885f2adfa25e4445331012003fd65
Abstract:	Deep Neural Networks (DNNs) form the backbone of contemporary deep learning, powering various artificial intelligence (AI) applications. However, their computational demands, primarily stemming from the resource-intensive Neuron Engine (NE), present a critical challenge. This NE comprises of Multiply-and-Accumulate (MAC) and Activation Function (AF) operations, contributing significantly to the overall computational overhead. To address these challenges, we propose a groundbreaking Precision-aware Neuron Engine (PNE) architecture, introducing a novel approach to low-bit and high-bit precision computations with minimal resource utilization. The PNE’s MAC unit stands out for its innovative pre-loading of the accumulator register with a bias value, eliminating the need for additional components like an extra adder, multiplexer, and bias register. This design achieves significant resource savings, with an 8-bit signed fixed-point implementation demonstrating notable reductions in resource utilization, critical delay, and power-delay product compared to conventional architectures. An 8-bit sfixed < N, q > implementation of the MAC in the PNE shows 29.23% savings in resource utilization and 32.91% savings in critical delay compared with IEEE architecture, and 24.91% savings in PDP (power-delay product) compared with booth architecture. Our comprehensive evaluation showcases the PNE’s efficacy in maintaining inferential accuracy across quantized and unquantized models. The proposed design not only achieves precision-awareness with a minimal increase (≈ 10%) in resource overhead, but also achieves a remarkable 34.61% increase in throughput and reduction in critical delay (34.37% faster than conventional design), highlighting its efficiency gains and superior performance in PNE computations. Software emulator shows minimal accuracy losses ranging from 0.6% to 1.6%, the PNE proves its versatility across different precisions and datasets, including MNIST (on LeNet) and ImageNet (on CaffeNet). The flexibility and configurability of the PNE make it a promising solution for precision-aware neuron processing, particularly in edge AI applications with stringent hardware constraints. This research contributes a pivotal advancement towards enhancing the efficiency of DNN computations through precision-aware architecture, paving the way for more resource-efficient and high-performance AI systems. © The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd. 2024.
URI:	https://doi.org/10.1007/s42979-024-02851-z https://dspace.iiti.ac.in/handle/123456789/13897
ISSN:	2662-995X
Type of Material:	Journal Article
Appears in Collections:	Department of Electrical Engineering

Files in This Item:

There are no files associated with this item.

Show full item record

Altmetric Badge: