Flex-PE: Flexible and SIMD Multiprecision Processing Element for AI Workloads

Lokhande, Mukul; Raut, Gopal; Vishvakarma, Santosh Kumar

Please use this identifier to cite or link to this item: https://dspace.iiti.ac.in/handle/123456789/15937

Full metadata record

DC Field	Value	Language
dc.contributor.author	Lokhande, Mukul	en_US
dc.contributor.author	Raut, Gopal	en_US
dc.contributor.author	Vishvakarma, Santosh Kumar	en_US
dc.date.accessioned	2025-04-22T17:45:34Z	-
dc.date.available	2025-04-22T17:45:34Z	-
dc.date.issued	2025	-
dc.identifier.citation	Lokhande, M., Raut, G., & Vishvakarma, S. K. (2025). Flex-PE: Flexible and SIMD Multiprecision Processing Element for AI Workloads. IEEE Transactions on Very Large Scale Integration (VLSI) Systems. https://doi.org/10.1109/TVLSI.2025.3553069	en_US
dc.identifier.issn	1063-8210	-
dc.identifier.other	EID(2-s2.0-105002672559)	-
dc.identifier.uri	https://doi.org/10.1109/TVLSI.2025.3553069	-
dc.identifier.uri	https://dspace.iiti.ac.in/handle/123456789/15937	-
dc.description.abstract	The rapid evolution of artificial intelligence (AI) models, from deep neural networks (DNNs) to transformers/large-language models (LLMs), demands flexible hardware solutions to meet diverse execution needs across edge and cloud platforms. Existing accelerators lack unified support for multiprecision arithmetic and runtime-configurable activation functions (AFs). This work proposes Flex-PE, a single instruction, multiple data (SIMD)-enabled multiprecision processing element that efficiently integrates multiply-and-accumulate operations with configurable AFs using unified hardware, including Sigmoid, Tanh, ReLU, and SoftMax. The proposed design achieves throughput improvements of up to 16x FxP4, 8x FxP8, 4x FxP16, and 1x FxP32, with maximum hardware efficiency for both iterative and pipelined architectures. An area-efficient iterative Flex-PE-based SIMD systolic array reduces DMA reads by up to 62x and 371x for input feature maps and weight filters in VGG-16, achieving 8.42 GOPS/W energy efficiency with minimal accuracy loss (<2%). Flex-PE scales from 4-bit edge inference to FxP8/16/32, supporting edge and cloud high-performance computing (HPC) while providing high-performance adaptable AI hardware with optimal precision, throughput, and energy efficiency. © 1993-2012 IEEE.	en_US
dc.language.iso	en	en_US
dc.publisher	Institute of Electrical and Electronics Engineers Inc.	en_US
dc.source	IEEE Transactions on Very Large Scale Integration (VLSI) Systems	en_US
dc.subject	Activation functions (AFs)	en_US
dc.subject	CORDIC	en_US
dc.subject	deep learning accelerators	en_US
dc.subject	multiprecision systolic arrays	en_US
dc.subject	single instruction, multiple data (SIMD) processing elements	en_US
dc.title	Flex-PE: Flexible and SIMD Multiprecision Processing Element for AI Workloads	en_US
dc.type	Journal Article	en_US
Appears in Collections:	Department of Electrical Engineering

Files in This Item:

There are no files associated with this item.

Show simple item record

Altmetric Badge: