UNFOLD: 3-D U-Net, 3-D CNN, and 3-D Transformer-Based Hyperspectral Image Denoising

Dixit, Aditya; Gupta, Anup Kumar; Gupta, Puneet

Please use this identifier to cite or link to this item: https://dspace.iiti.ac.in/handle/123456789/12862

Full metadata record

DC Field	Value	Language
dc.contributor.author	Dixit, Aditya	en_US
dc.contributor.author	Gupta, Anup Kumar	en_US
dc.contributor.author	Gupta, Puneet	en_US
dc.date.accessioned	2023-12-22T09:16:24Z	-
dc.date.available	2023-12-22T09:16:24Z	-
dc.date.issued	2023	-
dc.identifier.citation	Qayyum, A., Razzak, I., Tanveer, M., & Mazher, M. (2023). Spontaneous Facial Behavior Analysis Using Deep Transformer-based Framework for Child-computer Interaction. ACM Transactions on Multimedia Computing, Communications and Applications. Scopus. https://doi.org/10.1145/3539577	en_US
dc.identifier.issn	0196-2892	-
dc.identifier.other	EID(2-s2.0-85177039506)	-
dc.identifier.uri	https://doi.org/10.1109/TGRS.2023.3328922	-
dc.identifier.uri	https://dspace.iiti.ac.in/handle/123456789/12862	-
dc.description.abstract	Hyperspectral images (HSIs) encompass data across numerous spectral bands, making them valuable in various practical fields such as remote sensing, agriculture, and marine monitoring. Unfortunately, inevitable noise introduction during sensing restricts their applicability, necessitating denoising for optimal utilization. The existing deep learning (DL)-based denoising methods suffer from various limitations. For instance, convolutional neural networks (CNNs) struggle with long-range dependencies, while vision transformers (ViTs) struggle to capture local details. This article introduces a novel method, UNFOLD, that addresses these inherent limitations by harmoniously integrating the strengths of 3-D U-Net, 3-D CNN, and 3-D Transformer architectures. Unlike several existing methods that predominantly capture dependencies either along the spatial or the spectral dimension, UNFOLD addresses HSI denoising as a 3-D task, synergizing spatial and spectral information through the utilization of 3-D Transformer and 3-D CNN. It employs the self-attention (SA) mechanism of Transformers to capture the global dependencies and model long-range relationships across spatial and spectral dimensions. To overcome the limitations of 3-D Transformer in capturing fine-grained local and spatial features, UNFOLD complements it by incorporating 3-D CNN. Moreover, UNFOLD utilize a modified form of 3-D U-Net architecture for HSI denoising, wherein it employs a 3-D Transformer-based encoder instead of the conventional 3-D CNN-based encoder. It further capitalizes on the property of U-Net to integrate features across various scales, thereby enhancing efficacy by preserving intricate structural details. Results from extensive experiments demonstrate that UNFOLD outperforms the state-of-the-art HSI denoising methods. © 1980-2012 IEEE.	en_US
dc.language.iso	en	en_US
dc.publisher	Institute of Electrical and Electronics Engineers Inc.	en_US
dc.source	IEEE Transactions on Geoscience and Remote Sensing	en_US
dc.subject	3-D convolutional neural networks (CNNs)	en_US
dc.subject	3-D transformers	en_US
dc.subject	3-D U-Net	en_US
dc.subject	hyperspectral imaging denoising	en_US
dc.subject	spatial spectral fusion	en_US
dc.title	UNFOLD: 3-D U-Net, 3-D CNN, and 3-D Transformer-Based Hyperspectral Image Denoising	en_US
dc.type	Journal Article	en_US
Appears in Collections:	Department of Computer Science and Engineering

Files in This Item:

There are no files associated with this item.

Show simple item record

Altmetric Badge: