TexStFusion : a controllable diffusion model using textural, structural, and textual feature fusion

Hegde, Suhas G.; Tiwari, Aruna

Please use this identifier to cite or link to this item: https://dspace.iiti.ac.in/handle/123456789/16393

Title:	TexStFusion : a controllable diffusion model using textural, structural, and textual feature fusion
Authors:	Hegde, Suhas G. Tiwari, Aruna
Keywords:	Controllable diffusion models;Image editing;Image generation;Text-to-Image diffusion models
Issue Date:	2025
Publisher:	Springer Science and Business Media Deutschland GmbH
Citation:	Hegde, S., & Tiwari, A. (2025). TexStFusion : a controllable diffusion model using textural, structural, and textual feature fusion. Signal, Image and Video Processing. https://doi.org/10.1007/s11760-025-04367-2
Abstract:	Recent advances in Text-to-Image (T2I) diffusion models enable highly realistic image generation from text. However, long and intricate descriptions often struggle to provide precise controls. To address this, we propose TexStFusion (TEXtural, STructural, TEXtual feature FUSION), a method that adds conditional controls to pre-trained T2I models. Unlike existing approaches relying on visual cues, we introduce composite maps, which fuse texture and structure-text maps derived from TextureNet and StructureNet encoders. This integration occurs without fine-tuning the T2I model, preserving prior knowledge. Our method achieves 25% better FID, 33% better SSIM, and 5% better CLIP-T scores with a dataset of just 30k images, in the best case. © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2025.
URI:	https://dx.doi.org/10.1007/s11760-025-04367-2 https://dspace.iiti.ac.in:8080/jspui/handle/123456789/16393
ISSN:	1863-1703
Type of Material:	Journal Article
Appears in Collections:	Department of Computer Science and Engineering

Files in This Item:

There are no files associated with this item.

Show full item record

Altmetric Badge: