Please use this identifier to cite or link to this item: https://dspace.iiti.ac.in/handle/123456789/15937
Full metadata record
DC FieldValueLanguage
dc.contributor.authorLokhande, Mukulen_US
dc.contributor.authorRaut, Gopalen_US
dc.contributor.authorVishvakarma, Santosh Kumaren_US
dc.date.accessioned2025-04-22T17:45:34Z-
dc.date.available2025-04-22T17:45:34Z-
dc.date.issued2025-
dc.identifier.citationLokhande, M., Raut, G., & Vishvakarma, S. K. (2025). Flex-PE: Flexible and SIMD Multiprecision Processing Element for AI Workloads. IEEE Transactions on Very Large Scale Integration (VLSI) Systems. https://doi.org/10.1109/TVLSI.2025.3553069en_US
dc.identifier.issn1063-8210-
dc.identifier.otherEID(2-s2.0-105002672559)-
dc.identifier.urihttps://doi.org/10.1109/TVLSI.2025.3553069-
dc.identifier.urihttps://dspace.iiti.ac.in/handle/123456789/15937-
dc.description.abstractThe rapid evolution of artificial intelligence (AI) models, from deep neural networks (DNNs) to transformers/large-language models (LLMs), demands flexible hardware solutions to meet diverse execution needs across edge and cloud platforms. Existing accelerators lack unified support for multiprecision arithmetic and runtime-configurable activation functions (AFs). This work proposes Flex-PE, a single instruction, multiple data (SIMD)-enabled multiprecision processing element that efficiently integrates multiply-and-accumulate operations with configurable AFs using unified hardware, including Sigmoid, Tanh, ReLU, and SoftMax. The proposed design achieves throughput improvements of up to 16x FxP4, 8x FxP8, 4x FxP16, and 1x FxP32, with maximum hardware efficiency for both iterative and pipelined architectures. An area-efficient iterative Flex-PE-based SIMD systolic array reduces DMA reads by up to 62x and 371x for input feature maps and weight filters in VGG-16, achieving 8.42 GOPS/W energy efficiency with minimal accuracy loss (<2%). Flex-PE scales from 4-bit edge inference to FxP8/16/32, supporting edge and cloud high-performance computing (HPC) while providing high-performance adaptable AI hardware with optimal precision, throughput, and energy efficiency. © 1993-2012 IEEE.en_US
dc.language.isoenen_US
dc.publisherInstitute of Electrical and Electronics Engineers Inc.en_US
dc.sourceIEEE Transactions on Very Large Scale Integration (VLSI) Systemsen_US
dc.subjectActivation functions (AFs)en_US
dc.subjectCORDICen_US
dc.subjectdeep learning acceleratorsen_US
dc.subjectmultiprecision systolic arraysen_US
dc.subjectsingle instruction, multiple data (SIMD) processing elementsen_US
dc.titleFlex-PE: Flexible and SIMD Multiprecision Processing Element for AI Workloadsen_US
dc.typeJournal Articleen_US
Appears in Collections:Department of Electrical Engineering

Files in This Item:
There are no files associated with this item.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Altmetric Badge: