An Empirical Evaluation of Enhanced Performance Softmax Function in Deep Learning

Mehra, Sumiran; Raut, Gopal; Purkayastha, Ribhu Das; Vishvakarma, Santosh Kumar

Please use this identifier to cite or link to this item: https://dspace.iiti.ac.in/handle/123456789/11905

Title:	An Empirical Evaluation of Enhanced Performance Softmax Function in Deep Learning
Authors:	Mehra, Sumiran Raut, Gopal Purkayastha, Ribhu Das Vishvakarma, Santosh Kumar
Keywords:	CORDIC algorithm;deep learning;hardware optimization;performance enhancement;pipeline stages;Softmax function (SF)
Issue Date:	2023
Publisher:	Institute of Electrical and Electronics Engineers Inc.
Abstract:	This article present a highly efficient and performance-enhanced Softmax Function (SF) designed for a deep neural network accelerator. The SF is an essential component of deep learning models, primarily used in the classification layer, and also in hidden layers of advanced neural networks like Transformer and Capsule networks. The primary challenge of designing an efficient hardware architecture for SF is the complex exponential and division computational sub-blocks. To address this challenge, a hardware-optimized pipelined CORDIC-based architecture is proposed, leveraging the mutual exclusivity of the CO-ordinate Rotational DIgital Computer (CORDIC) algorithm, designed for enhanced throughput, area, and power. To maintain good accuracy in deep learning models, the proposed SF design undergoes a Pareto study that evaluates the variation of accuracy concerning the number of pipeline stages. The proposed design is quantized to 16-bit precision, and inference accuracy is validated for different datasets. The SF is prototyped using Xilinx Zynq FPGA, operating at 685MHz, and ASIC implementation is performed for 45nm technology node at 5GHz of maximum operating frequency. The design achieves a validation accuracy loss of less than 2% while reducing silicon area and Energy-Delay-Product (EDP) by 12×. Post-synthesis simulation results indicate that the proposed design outperforms state-of-the-art architectures, achieving 3× better performance in terms of area, power, and logic delay. © 2013 IEEE.
URI:	https://doi.org/10.1109/ACCESS.2023.3265327 https://dspace.iiti.ac.in/handle/123456789/11905
ISSN:	2169-3536
Type of Material:	Journal Article
Appears in Collections:	Department of Electrical Engineering

Files in This Item:

There are no files associated with this item.

Show full item record

Altmetric Badge: