Designing a Performance-Centric MAC Unit with Pipelined Architecture for DNN Accelerators

Raut, Gopal; Mukala, Jogesh; Vishvakarma, Santosh Kumar

Please use this identifier to cite or link to this item: https://dspace.iiti.ac.in/handle/123456789/12012

Full metadata record

DC Field	Value	Language
dc.contributor.author	Raut, Gopal	en_US
dc.contributor.author	Mukala, Jogesh	en_US
dc.contributor.author	Vishvakarma, Santosh Kumar	en_US
dc.date.accessioned	2023-06-24T13:06:41Z	-
dc.date.available	2023-06-24T13:06:41Z	-
dc.date.issued	2023	-
dc.identifier.issn	0278-081X	-
dc.identifier.other	EID(2-s2.0-85159404426)	-
dc.identifier.uri	https://doi.org/10.1007/s00034-023-02387-2	-
dc.identifier.uri	https://dspace.iiti.ac.in/handle/123456789/12012	-
dc.description.abstract	In order to improve the performance of deep neural network (DNN) accelerators, it is necessary to optimize compute efficiency and operating frequency. However, the implementation of contemporary DNNs often requires excessive resources due to the heavy multiply-and-accumulate (MAC) computations. In this work proposes a MAC unit designed with a Co-ordinate Rotation DIgital Computer (CORDIC)-based architecture, which is both power and area-efficient for 8-bit and higher-bit precision. The CORDIC-based designs are typically associated with low throughput. To address this issue, a performance-centric pipelined architecture is investigated that increases throughput. The study conducts a detailed Pareto analysis of accuracy variation at different precision levels and required pipeline stages to achieve high performance. The proposed MAC unit’s post-synthesis results at the 45nm technology node are provided, and performance is evaluated on a deep neural network using Vertex-7 FPGA board. The proposed fixed-point MAC architecture is scalable for all bit-precision and flexible for the decimal point implication. The study finds that the proposed Fixed Q 3.5 precision with five pipeline stage-based MAC shows better performance metrics compared to the recursive CORDIC-based MAC design. The proposed MAC design has a lower area-delay-product (ADP) which is 1.13 × , and higher throughput of 2.73 × compared to the recursive CORDIC-based MAC. The study evaluated the performance of the proposed MAC unit using the fully connected NN for the MNIST dataset and found that the throughput 1.89 × better compared to the conventional MAC-based design. © 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.	en_US
dc.language.iso	en	en_US
dc.publisher	Birkhauser	en_US
dc.source	Circuits, Systems, and Signal Processing	en_US
dc.subject	CORDIC-based architecture	en_US
dc.subject	Deep neural networks (DNNs)	en_US
dc.subject	Multiply-and-accumulate (MAC) unit	en_US
dc.subject	Performance-centric pipelined architecture	en_US
dc.subject	Throughput	en_US
dc.title	Designing a Performance-Centric MAC Unit with Pipelined Architecture for DNN Accelerators	en_US
dc.type	Journal Article	en_US
Appears in Collections:	Department of Electrical Engineering

Files in This Item:

There are no files associated with this item.

Show simple item record

Altmetric Badge: