Please use this identifier to cite or link to this item: https://dspace.iiti.ac.in/handle/123456789/11905
Title: An Empirical Evaluation of Enhanced Performance Softmax Function in Deep Learning
Authors: Mehra, Sumiran
Raut, Gopal
Purkayastha, Ribhu Das
Vishvakarma, Santosh Kumar
Keywords: CORDIC algorithm;deep learning;hardware optimization;performance enhancement;pipeline stages;Softmax function (SF)
Issue Date: 2023
Publisher: Institute of Electrical and Electronics Engineers Inc.
Citation: Mehra, S., Raut, G., Purkayastha, R. D., Vishvakarma, S. K., & Biasizzo, A. (2023). An empirical evaluation of enhanced performance softmax function in deep learning. IEEE Access, 11, 34912-34924. doi:10.1109/ACCESS.2023.3265327
Abstract: This article present a highly efficient and performance-enhanced Softmax Function (SF) designed for a deep neural network accelerator. The SF is an essential component of deep learning models, primarily used in the classification layer, and also in hidden layers of advanced neural networks like Transformer and Capsule networks. The primary challenge of designing an efficient hardware architecture for SF is the complex exponential and division computational sub-blocks. To address this challenge, a hardware-optimized pipelined CORDIC-based architecture is proposed, leveraging the mutual exclusivity of the CO-ordinate Rotational DIgital Computer (CORDIC) algorithm, designed for enhanced throughput, area, and power. To maintain good accuracy in deep learning models, the proposed SF design undergoes a Pareto study that evaluates the variation of accuracy concerning the number of pipeline stages. The proposed design is quantized to 16-bit precision, and inference accuracy is validated for different datasets. The SF is prototyped using Xilinx Zynq FPGA, operating at 685MHz, and ASIC implementation is performed for 45nm technology node at 5GHz of maximum operating frequency. The design achieves a validation accuracy loss of less than 2% while reducing silicon area and Energy-Delay-Product (EDP) by 12×. Post-synthesis simulation results indicate that the proposed design outperforms state-of-the-art architectures, achieving 3× better performance in terms of area, power, and logic delay. © 2013 IEEE.
URI: https://doi.org/10.1109/ACCESS.2023.3265327
https://dspace.iiti.ac.in/handle/123456789/11905
ISSN: 2169-3536
Type of Material: Journal Article
Appears in Collections:Department of Electrical Engineering

Files in This Item:
There are no files associated with this item.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Altmetric Badge: