Please use this identifier to cite or link to this item: https://dspace.iiti.ac.in/handle/123456789/12012
Full metadata record
DC FieldValueLanguage
dc.contributor.authorRaut, Gopalen_US
dc.contributor.authorMukala, Jogeshen_US
dc.contributor.authorVishvakarma, Santosh Kumaren_US
dc.date.accessioned2023-06-24T13:06:41Z-
dc.date.available2023-06-24T13:06:41Z-
dc.date.issued2023-
dc.identifier.citationRaut, G., Mukala, J., Sharma, V., & Vishvakarma, S. K. (2023). Designing a performance-centric MAC unit with pipelined architecture for DNN accelerators. Circuits, Systems, and Signal Processing, doi:10.1007/s00034-023-02387-2en_US
dc.identifier.issn0278-081X-
dc.identifier.otherEID(2-s2.0-85159404426)-
dc.identifier.urihttps://doi.org/10.1007/s00034-023-02387-2-
dc.identifier.urihttps://dspace.iiti.ac.in/handle/123456789/12012-
dc.description.abstractIn order to improve the performance of deep neural network (DNN) accelerators, it is necessary to optimize compute efficiency and operating frequency. However, the implementation of contemporary DNNs often requires excessive resources due to the heavy multiply-and-accumulate (MAC) computations. In this work proposes a MAC unit designed with a Co-ordinate Rotation DIgital Computer (CORDIC)-based architecture, which is both power and area-efficient for 8-bit and higher-bit precision. The CORDIC-based designs are typically associated with low throughput. To address this issue, a performance-centric pipelined architecture is investigated that increases throughput. The study conducts a detailed Pareto analysis of accuracy variation at different precision levels and required pipeline stages to achieve high performance. The proposed MAC unit’s post-synthesis results at the 45nm technology node are provided, and performance is evaluated on a deep neural network using Vertex-7 FPGA board. The proposed fixed-point MAC architecture is scalable for all bit-precision and flexible for the decimal point implication. The study finds that the proposed Fixed Q 3.5 precision with five pipeline stage-based MAC shows better performance metrics compared to the recursive CORDIC-based MAC design. The proposed MAC design has a lower area-delay-product (ADP) which is 1.13 × , and higher throughput of 2.73 × compared to the recursive CORDIC-based MAC. The study evaluated the performance of the proposed MAC unit using the fully connected NN for the MNIST dataset and found that the throughput 1.89 × better compared to the conventional MAC-based design. © 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.en_US
dc.language.isoenen_US
dc.publisherBirkhauseren_US
dc.sourceCircuits, Systems, and Signal Processingen_US
dc.subjectCORDIC-based architectureen_US
dc.subjectDeep neural networks (DNNs)en_US
dc.subjectMultiply-and-accumulate (MAC) uniten_US
dc.subjectPerformance-centric pipelined architectureen_US
dc.subjectThroughputen_US
dc.titleDesigning a Performance-Centric MAC Unit with Pipelined Architecture for DNN Acceleratorsen_US
dc.typeJournal Articleen_US
Appears in Collections:Department of Electrical Engineering

Files in This Item:
There are no files associated with this item.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Altmetric Badge: