Please use this identifier to cite or link to this item: https://dspace.iiti.ac.in/handle/123456789/12012
Title: Designing a Performance-Centric MAC Unit with Pipelined Architecture for DNN Accelerators
Authors: Raut, Gopal
Mukala, Jogesh
Vishvakarma, Santosh Kumar
Keywords: CORDIC-based architecture;Deep neural networks (DNNs);Multiply-and-accumulate (MAC) unit;Performance-centric pipelined architecture;Throughput
Issue Date: 2023
Publisher: Birkhauser
Citation: Raut, G., Mukala, J., Sharma, V., & Vishvakarma, S. K. (2023). Designing a performance-centric MAC unit with pipelined architecture for DNN accelerators. Circuits, Systems, and Signal Processing, doi:10.1007/s00034-023-02387-2
Abstract: In order to improve the performance of deep neural network (DNN) accelerators, it is necessary to optimize compute efficiency and operating frequency. However, the implementation of contemporary DNNs often requires excessive resources due to the heavy multiply-and-accumulate (MAC) computations. In this work proposes a MAC unit designed with a Co-ordinate Rotation DIgital Computer (CORDIC)-based architecture, which is both power and area-efficient for 8-bit and higher-bit precision. The CORDIC-based designs are typically associated with low throughput. To address this issue, a performance-centric pipelined architecture is investigated that increases throughput. The study conducts a detailed Pareto analysis of accuracy variation at different precision levels and required pipeline stages to achieve high performance. The proposed MAC unit’s post-synthesis results at the 45nm technology node are provided, and performance is evaluated on a deep neural network using Vertex-7 FPGA board. The proposed fixed-point MAC architecture is scalable for all bit-precision and flexible for the decimal point implication. The study finds that the proposed Fixed Q 3.5 precision with five pipeline stage-based MAC shows better performance metrics compared to the recursive CORDIC-based MAC design. The proposed MAC design has a lower area-delay-product (ADP) which is 1.13 × , and higher throughput of 2.73 × compared to the recursive CORDIC-based MAC. The study evaluated the performance of the proposed MAC unit using the fully connected NN for the MNIST dataset and found that the throughput 1.89 × better compared to the conventional MAC-based design. © 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
URI: https://doi.org/10.1007/s00034-023-02387-2
https://dspace.iiti.ac.in/handle/123456789/12012
ISSN: 0278-081X
Type of Material: Journal Article
Appears in Collections:Department of Electrical Engineering

Files in This Item:
There are no files associated with this item.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Altmetric Badge: