Please use this identifier to cite or link to this item:
https://dspace.iiti.ac.in/handle/123456789/11905
Title: | An Empirical Evaluation of Enhanced Performance Softmax Function in Deep Learning |
Authors: | Mehra, Sumiran Raut, Gopal Purkayastha, Ribhu Das Vishvakarma, Santosh Kumar |
Keywords: | CORDIC algorithm;deep learning;hardware optimization;performance enhancement;pipeline stages;Softmax function (SF) |
Issue Date: | 2023 |
Publisher: | Institute of Electrical and Electronics Engineers Inc. |
Citation: | Mehra, S., Raut, G., Purkayastha, R. D., Vishvakarma, S. K., & Biasizzo, A. (2023). An empirical evaluation of enhanced performance softmax function in deep learning. IEEE Access, 11, 34912-34924. doi:10.1109/ACCESS.2023.3265327 |
Abstract: | This article present a highly efficient and performance-enhanced Softmax Function (SF) designed for a deep neural network accelerator. The SF is an essential component of deep learning models, primarily used in the classification layer, and also in hidden layers of advanced neural networks like Transformer and Capsule networks. The primary challenge of designing an efficient hardware architecture for SF is the complex exponential and division computational sub-blocks. To address this challenge, a hardware-optimized pipelined CORDIC-based architecture is proposed, leveraging the mutual exclusivity of the CO-ordinate Rotational DIgital Computer (CORDIC) algorithm, designed for enhanced throughput, area, and power. To maintain good accuracy in deep learning models, the proposed SF design undergoes a Pareto study that evaluates the variation of accuracy concerning the number of pipeline stages. The proposed design is quantized to 16-bit precision, and inference accuracy is validated for different datasets. The SF is prototyped using Xilinx Zynq FPGA, operating at 685MHz, and ASIC implementation is performed for 45nm technology node at 5GHz of maximum operating frequency. The design achieves a validation accuracy loss of less than 2% while reducing silicon area and Energy-Delay-Product (EDP) by 12×. Post-synthesis simulation results indicate that the proposed design outperforms state-of-the-art architectures, achieving 3× better performance in terms of area, power, and logic delay. © 2013 IEEE. |
URI: | https://doi.org/10.1109/ACCESS.2023.3265327 https://dspace.iiti.ac.in/handle/123456789/11905 |
ISSN: | 2169-3536 |
Type of Material: | Journal Article |
Appears in Collections: | Department of Electrical Engineering |
Files in This Item:
There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
Altmetric Badge: