Malware Detection Using Hybrid Vision Transformer with CNN backbone

Sharmila, S. P.; Chaudhari, Narendra Shivaji

Please use this identifier to cite or link to this item: https://dspace.iiti.ac.in/handle/123456789/18009

Full metadata record

DC Field	Value	Language
dc.contributor.author	Sharmila, S. P.	en_US
dc.contributor.author	Chaudhari, Narendra Shivaji	en_US
dc.date.accessioned	2026-03-12T10:55:39Z	-
dc.date.available	2026-03-12T10:55:39Z	-
dc.date.issued	2025	-
dc.identifier.citation	Sharmila, S. P., Pandey, S., Chaurasia, A., & Chaudhari, N. S. (2025). Malware Detection Using Hybrid Vision Transformer with CNN backbone. In Proceedings of the 2025 International Conference on Emerging Techniques in Computational Intelligence, ICETCI 2025. https://doi.org/10.1109/ICETCI67340.2025.11257893	en_US
dc.identifier.isbn	979-8331595630	-
dc.identifier.other	EID(2-s2.0-105030208515)	-
dc.identifier.uri	https://dx.doi.org/10.1109/ICETCI67340.2025.11257893	-
dc.identifier.uri	https://dspace.iiti.ac.in/handle/123456789/18009	-
dc.description.abstract	Traditional signature-based malware detection techniques struggle to keep up with the evolving nature of malicious software. These traditional methods often fail to detect novel, obfuscated, or polymorphic malware, posing significant challenges for cybersecurity professionals. As a result, there has been a growing shift toward intelligent, learning-based techniques that can generalize across malware variants. In this paper, we propose a novel hybrid deep learning model that leverages the power of Convolutional Neural Networks (CNNs) and Vision Transformers (ViT) to enhance malware detection. The approach begins by converting executable (.exe) files into grayscale images, thereby transforming raw binary data into a visual domain compatible with modern computer vision models. CNNs are utilized for local feature extraction, capturing spatial relationships such as byte-level patterns and textures within the image, while ViT models global dependencies and contextual relationships across the image using self-attention mechanisms. This synergy enables the model to effectively detect both simple and complex malware families with high accuracy. The proposed architecture is evaluated on a benchmark malware image dataset and demonstrates superior performance, achieving an accuracy of 98.22% and precision 98.35%, along with excellent F1-Score and recall capability. The results indicate the model's robustness, scalability, and potential for integration into real-time malware detection systems to enhance cybersecurity infrastructure. © 2025 IEEE.	en_US
dc.language.iso	en	en_US
dc.publisher	Institute of Electrical and Electronics Engineers Inc.	en_US
dc.source	Proceedings of the 2025 International Conference on Emerging Techniques in Computational Intelligence, ICETCI 2025	en_US
dc.title	Malware Detection Using Hybrid Vision Transformer with CNN backbone	en_US
dc.type	Conference Paper	en_US
Appears in Collections:	Department of Computer Science and Engineering

Files in This Item:

There are no files associated with this item.

Show simple item record

Altmetric Badge: