QuantMAC: Enhancing Hardware Performance in DNNs with Quantize Enabled Multiply-Accumulate Unit

Ashar, Neha; Raut, Gopal; Trivedi, Vasundhara; Vishvakarma, Santosh Kumar

Please use this identifier to cite or link to this item: https://dspace.iiti.ac.in/handle/123456789/13610

Full metadata record

DC Field	Value	Language
dc.contributor.author	Ashar, Neha	en_US
dc.contributor.author	Raut, Gopal	en_US
dc.contributor.author	Trivedi, Vasundhara	en_US
dc.contributor.author	Vishvakarma, Santosh Kumar	en_US
dc.date.accessioned	2024-04-26T12:43:28Z	-
dc.date.available	2024-04-26T12:43:28Z	-
dc.date.issued	2024	-
dc.identifier.citation	Ashar, N., Raut, G., Trivedi, V., Vishvakarma, S. K., & Kumar, A. (2024). QuantMAC: Enhancing Hardware Performance in DNNs with Quantize Enabled Multiply-Accumulate Unit. IEEE Access. Scopus. https://doi.org/10.1109/ACCESS.2024.3379906	en_US
dc.identifier.issn	2169-3536	-
dc.identifier.other	EID(2-s2.0-85188432106)	-
dc.identifier.uri	https://doi.org/10.1109/ACCESS.2024.3379906	-
dc.identifier.uri	https://dspace.iiti.ac.in/handle/123456789/13610	-
dc.description.abstract	In response to the escalating demand for hardware-efficient Deep Neural Network (DNN) architectures, we present a novel quantize-enabled multiply-accumulate (MAC) unit. Our methodology employs a right shift-and-add computation for MAC operation, enabling runtime truncation without additional hardware. This architecture optimally utilizes hardware resources, enhancing throughput performance while reducing computational complexity through bit-truncation techniques. Our key methodology involves designing a hardware-efficient MAC computational algorithm that supports both iterative and pipeline implementations, catering to diverse hardware efficiency or enhanced throughput requirements in accelerators. Additionally, we introduce a processing element (PE) with a pre-loading bias scheme, reducing one clock delay and eliminating the need for conventional extra resources in PE implementation. The PE facilitates quantization-based MAC calculations through an efficient bit-truncation method, removing the necessity for extra hardware logic. This versatile PE accommodates variable bit-precision with a dynamic fraction part within the sfxpt< N,f > representation, meeting specific model or layer demands. Through software emulation, our proposed approach demonstrates minimal accuracy loss, revealing under 1.6% loss for LeNet-5 using MNIST and around 4% for ResNet-18 and VGG-16 with CIFAR-10 in the sfxpt< 8 ,5 > format compared to conventional float32-based implementations. Hardware performance parameters on the Xilinx-Virtex-7 board unveil a 37% reduction in area utilization and a 45% reduction in power consumption compared to the best state-of-the-art MAC architecture. Extending the proposed MAC to a LeNet DNN model results in a 42% reduction in resource requirements and a significant 27% reduction in delay. This architecture provides notable advantages for resource-efficient, high-throughput edge-AI applications. � 2013 IEEE.	en_US
dc.language.iso	en	en_US
dc.publisher	Institute of Electrical and Electronics Engineers Inc.	en_US
dc.source	IEEE Access	en_US
dc.subject	Approximate compute	en_US
dc.subject	bit-truncation	en_US
dc.subject	CORDIC	en_US
dc.subject	deep neural network	en_US
dc.subject	hardware accelerator	en_US
dc.subject	quantize processing element	en_US
dc.title	QuantMAC: Enhancing Hardware Performance in DNNs with Quantize Enabled Multiply-Accumulate Unit	en_US
dc.type	Journal Article	en_US
dc.rights.license	All Open Access, Gold	-
Appears in Collections:	Department of Electrical Engineering

Files in This Item:

There are no files associated with this item.

Show simple item record

Altmetric Badge: