FAUNet: A Fuzzy-Attention U-Net for Diffusion-Based Persian Text Image Super-Resolution

Document Type : Research Paper

Authors

Graduate university of advanced technology

10.22111/ijfs.2026.53720.9515

Abstract

The accurate enhancement of text images is a critical challenge in computer vision, particularly for languages such as Persian that exhibit complex writing structures, cursive connections, and fine-grained diacritical marks. Traditional super-resolution approaches often fail to preserve these delicate textual details. Here, diffusion model is adopted for text image super-resolution. The U-Net framework of this method is enhanced by incorporating fuzzy logic and attention mechanism (named FAUNet) to address mentioned problems. At the bottleneck of the network, a fuzzy layer is employed to softly model uncertainties and boundary variations, while a spatial channel attention block adaptively emphasizes crucial regions of the image. Together, these components strengthen the network’s capacity to capture structural dependencies and semantic details essential for text clarity. The proposed model is rigorously evaluated on two large-scale Persian text datasets: IR-LPR that comprising vehicle license plate images, and IDPL-PFOD2 that is a dataset of printed Persian text. Experimental results show that FAUNet outperforms state-of-the-art methods achieving improvements in PSNR, SSIM, and MS-SSIM metrics. These improvements not only contribute to higher visual quality but also hold strong potential for downstream applications such as optical character recognition (OCR), license plate recognition, and digital document restoration in low-quality imaging conditions.

Keywords

Main Subjects


[1] A. Afkari-Fahandari, F. Asadi-Zeydabadi, E. Shabaninia, H. Nezamabadi-Pour, Enhancing Farsi text recognition
via iteratively using a language model, in 2024 20th CSI International Symposium on Artificial Intelligence and
Signal Processing (AISP), (2024), 1-6. https://doi.org/10.1109/AISP61396.2024.10475269
[2] S. AlHalawani, B. Benjdira, A. Ammar, A. Koubaa, A. M. Ali, DiffPlate: A diffusion model for super-resolution
of license plate images, Electronics, 13(13) (2024), 2670. https://doi.org/10.3390/electronics13132670
[3] F. Asadi-Zeydabadi, A. Afkari-Fahandari, A. Faraji, E. Shabaninia, H. Nezamabadi-Pour, IDPL-PFOD2: A new
large-scale dataset for printed Farsi optical character recognition, arXiv preprint, arXiv:2312.01177, (2023). https:
//doi.org/10.48550/arXiv.2312.01177
[4] A. Brock, J. Donahue, K. Simonyan, Large scale GAN training for high fidelity natural image synthesis, in Proceedings
of the 7th International Conference on Learning Representations (ICLR), 2019. https://doi.org/10.
48550/arXiv.1809.11096
[5] W. Chao, J. Zhao, F. Duan, G. Wang, et al., LFSRDiff: Light field image super-resolution via diffusion models, in
ICASSP 2025–2025, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (2025),
1-5. https://doi.org/10.1109/ICASSP49660.2025.10889642
[6] K. Cheng, et al., Effective diffusion transformer architecture for image super-resolution, in Proceedings of the AAAI
Conference on Artificial Intelligence, 39(3) (2025), 2455-2463. https://doi.org/10.1609/aaai.v39i3.32247
[7] C. Dong, C. C. Loy, K. He, X. Tang, Image super-resolution using deep convolutional networks, IEEE Transactions
on Pattern Analysis and Machine Intelligence, 38(2) (2016), 295-307. https://doi.org/10.1109/TPAMI.2015.
2439281
[8] C. Dong, C. C. Loy, X. Tang, Accelerating the super-resolution convolutional neural network, in European Conference
on Computer Vision, (2016), 391-407. https://doi.org/10.1007/978-3-319-46475-6_25
[9] M. El Helou, Fuzzy-conditioned diffusion and diffusion projection attention applied to facial image correction, in
2023, IEEE International Conference on Image Processing (ICIP), (2023), 236-240. https://doi.org/10.1109/
ICIP49359.2023.10223103
[10] C. Y. Fang, C. S. Fuh, P. Yen, S. Cherng, S. W. Chen, An automatic road sign recognition system based on a
computational model of human recognition processing, Computer Vision and Image Understanding, 96(2) (2004),
237-268. https://doi.org/10.1016/j.cviu.2004.02.007
[11] S. Frolov, T. Hinz, F. Raue, J. Hees, A. Dengel, Adversarial text-to-image synthesis: A review, Neural Networks,
144 (2021), 187-209. https://doi.org/10.1016/j.neunet.2021.07.019
[12] I. J. Goodfellow, et al., Generative adversarial nets, Advances in Neural Information Processing Systems, 27 (2014),
2672-2680. https://doi.org/10.48550/arXiv.1406.2661
[13] M. H. Guo, et al., Attention mechanisms in computer vision: A survey, Computational Visual Media, 8(3) (2022),
331-368. https://doi.org/10.1007/s41095-022-0271-y
[14] J. Ho, A. Jain, P. Abbeel, Denoising diffusion probabilistic models, Advances in Neural Information Processing
Systems, 33 (2020), 6840-6851. https://doi.org/10.48550/arXiv.2006.11239
[15] L. Hua, et al., Attention in diffusion model: A survey, arXiv preprint, arXiv:2504.03738, (2025). https://doi.
org/10.48550/arXiv.2504.03738
[16] R. Keys, Cubic convolution interpolation for digital image processing, IEEE Transactions on Acoustics, Speech,
and Signal Processing, 29(6) (1981), 1153-1160. https://doi.org/10.1109/TASSP.1981.1163711
[17] T. R. Khalifa, X. Yu, X. Zhong, Z. Wu, Adaptive general type-2 fuzzy model-based control for nonlinear networked
systems with packet dropouts, ISA Transactions, 159 (2025), 257-277. https://doi.org/10.1016/j.isatra.2025.
02.009
[18] T. R. Khalifa, X. Yu, X. Zhong, Z. Wu, Indirect adaptive interval type-3 fuzzy tracking control for nonlinear discretetime
networked control systems with DoS attacks, IEEE Transactions on Cybernetics, 55(10) (2025), 4967-4980.
https://doi.org/10.1109/TCYB.2025.3591555
[19] E. Khodadadi, H. R. Kanan, Which super-resolution algorithm is proper for Farsi text image sequences, in 2015
2nd, International Conference on Pattern Recognition and Image Analysis (IPRIA), (2015), 1-4. https://doi.
org/10.1109/PRIA.2015.7161617
[20] D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, in Proceedings of the 3rd International Conference
on Learning Representations (ICLR), (2015). https://doi.org/10.48550/arXiv.1412.6980
[21] M. Kushki, E. Rashedi, E. Shabaninia, M. Kamandar, Enhancing low-resolution Persian license plates via diffusion
models, Journal of Computing and Security, under revision, 2026
[22] C. Ledig, et al., Photo-realistic single image super-resolution using a generative adversarial network, in Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition, (2017), 4681-4690. https://doi.org/10.
1109/CVPR.2017.19
[23] H. Li, et al., SRDiff: Single image super-resolution with diffusion probabilistic models, Neurocomputing, 479
(2022), 47-59. https://doi.org/10.1016/j.neucom.2022.01.029
[24] J. Liang, J. Cao, G. Sun, K. Zhang, L. Van Gool, R. Timofte, SwinIR: Image restoration using swin transformer,
in Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), (2021),
1833-1844. https://doi.org/10.1109/ICCVW54120.2021.00210
[25] Z. Luo, F. K. Gustafsson, Z. Zhao, J. Sjölund, T. B. Schön, Refusion: Enabling large-size realistic image restoration
with latent-space diffusion models, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition Workshops (CVPRW), (2023), 1680-1691. https://doi.org/10.1109/CVPRW59228.2023.00169
[26] K. Mehregan, A. Ahmadyfard, H. Khosravi, Super-resolution of license-plates using frames of low-resolution video,
in 2019 5th, Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS), (2019), 1-6. https://doi.
org/10.1109/ICSPIS48872.2019.9066104
[27] B. B. Moser, S. Frolov, F. Raue, S. Palacio, A. Dengel, Dynamic attention-guided diffusion for image superresolution,
in 2025, IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), (2025), 451-460.
https://doi.org/10.1109/WACV61041.2025.00054
[28] B. B. Moser, A. Shanbhag, F. Raue, S. Frolov, S. Palacio, A. Dengel, Diffusion models, image super-resolution and
everything: A survey, IEEE Transactions on Neural Networks and Learning Systems, early access, 36(7) (2025).
https://doi.org/10.1109/TNNLS.2024.3476671
[29] J. Nam, H. Kim, D. Lee, S. Jin, S. Kim, S. Chang, DreamMatcher: Appearance matching self-attention for
semantically-consistent text-to-image personalization, in Proceedings of the IEEE/CVF Conference on Computer
Vision and Pattern Recognition, (2024), 8100-8110. https://doi.org/10.1109/CVPR52733.2024.00774
[30] Z. Niu, G. Zhong, H. Yu, A review on the attention mechanism of deep learning, Neurocomputing, 452 (2021),
48-62. https://doi.org/10.1016/j.neucom.2021.03.091
[31] M. Rahmani, M. Sabaghian, S. M. Moghadami, M. M. Talaie, M. Naghibi, M. A. Keyvanrad, IR-LPR: A large
scale Iranian license plate recognition dataset, in 2022 12th, International Conference on Computer and Knowledge
Engineering (ICCKE), (2022), 53-58. https://doi.org/10.1109/ICCKE57176.2022.9960129
[32] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, B. Ommer, High-resolution image synthesis with latent diffusion
models, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2022), 10684-
10695. https://doi.org/10.1109/CVPR52688.2022.01042
[33] C. Saharia, J. Ho, W. Chan, T. Salimans, D. J. Fleet, M. Norouzi, Image super-resolution via iterative refinement,
IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4) (2023), 4713-4726. https://doi.org/10.
1109/TPAMI.2022.3204461
[34] A. Sharafian, A. Ali, I. Ullah, T. R. Khalifa, X. Bai, L. Qiu, Fuzzy adaptive control for consensus tracking in
multiagent systems with incommensurate fractional-order dynamics: Application to power systems, Information
Sciences, 689 (2025), Article 121455. https://doi.org/10.1016/j.ins.2024.121455
[35] W. Shi, et al., Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural
network, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2016), 1874-1883.
https://doi.org/10.1109/CVPR.2016.207
[36] J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, S. Ganguli, Deep unsupervised learning using nonequilibrium
thermodynamics, in Proceedings of the 32nd International Conference on Machine Learning (ICML), (2015), 2256-
2265. https://doi.org/10.48550/arXiv.1503.03585
[37] A. Torkian, P. Moallem, Multi-frame super resolution for improving vehicle licence plate recognition, Signal and
Data Processing, 16(2) (2016), 61-76. https://doi.org/10.29252/jsdp.16.2.61
[38] Z. Wang, A. C. Bovik, H. R. Sheikh, E. P. Simoncelli, Image quality assessment: From error visibility to structural
similarity, IEEE Transactions on Image Processing, 13(4) (2004), 600-612. https://doi.org/10.1109/TIP.2003.
819861
[39] Z. Wang, J. Chen, S. C. Hoi, Deep learning for image super-resolution: A survey, IEEE Transactions on Pattern
Analysis and Machine Intelligence, 43(10) (2021), 3365-3387. https://doi.org/10.1109/TPAMI.2020.2982166
[40] Z. Wang, E. P. Simoncelli, A. C. Bovik, Multiscale structural similarity for image quality assessment, in The
Thirty-Seventh Asilomar Conference on Signals, Systems and Computers, 2 (2003), 1398-1402. https://doi.org/
10.1109/ACSSC.2003.1292216
[41] X. Wang, L. Xie, C. Dong, Y. Shan, Real-ESRGAN: Training real-world blind super-resolution with pure synthetic
data, in Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), (2021),
1905-1914. https://doi.org/10.1109/ICCVW54120.2021.00217
[42] Y. Wang, J. Xu, Z. Liang, Q. Chong, X. Cheng, FDM: Document image seen-through removal via fuzzy diffusion
models, Pattern Recognition Letters, 184 (2024), 183-189. https://doi.org/10.1016/j.patrec.2024.06.015
[43] Y. Xiao, Q. Yuan, K. Jiang, J. He, X. Jin, L. Zhang, EDiffSR: An efficient diffusion probabilistic model for
remote sensing image super-resolution, IEEE Transactions on Geoscience and Remote Sensing, 62 (2024), 1-14.
https://doi.org/10.1109/TGRS.2023.3341437
[44] Y. Xue, K. Chen, F. Neri, Differentiable architecture search with attention mechanisms for generative adversarial
networks, IEEE Transactions on Emerging Topics in Computational Intelligence, 8(4) (2024), 3141-3151. https:
//doi.org/10.1109/TETCI.2024.3369998
[45] Y. Xue, X. Han, Z. Wang, Self-adaptive weight based on dual-attention for differentiable neural architecture search,
IEEE Transactions on Industrial Informatics, 20(4) (2024), 6394-6403. https://doi.org/10.1109/TII.2023.
3348843