Integrating Fuzzy Logic into Transformer-Based Models for Long-Term Multivariate Time Series Forecasting: A Novel Approach to Fuzzy Positional Encoding

Document Type : Research Paper

Authors

1 Department of Computer Engineering, Faculty of Engineering, Shahid Bahonar University of Kerman, Kerman, Iran.

2 Department of Computer Engineering, Faculty of Engineering, Shahid Bahonar University of Kerman, Kerman, Iran

3 Department of Computer Engineering, Shahid Bahonar University of Kerman, Kerman, Iran

10.22111/ijfs.2026.54871.9725

Abstract

Long-term multivariate time series forecasting is one of the most challenging problems in machine learning. Among the proposed solutions, deep learning networks—particularly transformer-based models—have demonstrated superior performance. However, these models are vulnerable to noise, uncertainty, and abrupt changes, and often lack interpretability. To address these limitations, this study introduces a novel hybrid architecture called FuzzyPE-KAN, which integrates fuzzy logic into the transformer framework. The proposed architecture incorporates: (1) a learnable Gaussian noise-based fuzzy attention mechanism that enhances robustness against noise; (2) a learnable fuzzy positional encoding relying on Gaussian membership functions and multilayer perceptrons to effectively model the inherently vague and graded nature of time; and (3) complete replacement of feed-forward layers with Kolmogorov–Arnold Networks to dramatically reduce the number of parameters and improve interpretability. The proposed architecture was applied to five state-of-the-art baseline models (Transformer, Informer, PatchTST, Crossformer, and iTransformer) and evaluated on eight standard benchmark datasets (ETTh1/2, ETTm1/2, Weather, Electricity, Traffic, and Exchange Rate). Results indicate that the proposed variants achieve an average improvement of 26–49% in Mean Squared Error and 17–29% in Mean Absolute Error across most scenarios compared to the baseline models. The most substantial gains were observed on the Exchange Rate dataset (78% improvement), Weather (71.28%), ETTh2 (76.41%), and ETTm2 (78.66%). This research demonstrates that the simultaneous integration of fuzzy logic and Kolmogorov–Arnold Networks within a transformer architecture not only enhances accuracy and robustness but also elevates model interpretability to a significant level, paving the way for real-world applications in finance, energy, and healthcare domains.

Keywords

Main Subjects


[1] L. T. Abdullah, Forecasting time series using vector autoregressive model, International Journal of Nonlinear
Analysis and Applications, 13(1) (2022), 499-511. https://doi.org/10.22075/ijnaa.2022.5521
[2] R. Al-Hmouz, W. Pedrycz, M. Mansouri, A. Al-Hmouz, Dimensionality-based evaluation of fuzzy models developed
for high-dimensional data, International Conference on Artificial Intelligence and Soft Computing, 15948 (2026),
231-242. https://doi.org/10.1007/978-3-032-03705-3_20
[3] K. Albeladi, B. Zafar, A. Mueen, Time series forecasting using LSTM and ARIMA, International Journal of
Advanced Computer Science and Applications, 14(1) (2023), 313-320. https://doi.org/10.14569/IJACSA.2023.
0140133
[4] J. Alcal´a-Fdez, et al., KEEL: A software tool to assess evolutionary algorithms for data mining problems, Soft
Computing, 13(3) (2009), 307-318. https://doi.org/10.1007/s00500-008-0323-y
[5] I. Amalou, N. Mouhni, A. Abdali, Multivariate time series prediction by RNN architectures for energy consumption
forecasting, Energy Reports, 8(9) (2022), 1084-1091. https://doi.org/10.1016/j.egyr.2022.07.139
[6] S. Bai, An empirical evaluation of generic convolutional and recurrent networks for sequence modeling, arXiv,
(2018). https://doi.org/10.48550/arXiv.1803.01271
[7] S. Chakraborty, F. Heintz, Enhancing time series forecasting with fuzzy attention-integrated transformers, arXiv,
(2025). https://doi.org/10.48550/arXiv.2504.00070
[8] X. Chen, L. Lai, M. Luo, FDACNet: Enhancing time-series classification with fuzzy feature and integrated selfattention
and temporal convolution, International Journal of Approximate Reasoning, 186 (2025). https://doi.
org/10.1016/j.ijar.2025.109521
[9] P. Diggle, E. Giorgi, Time series: A biostatistical introduction, Oxford University Press, 2025. https://doi.org/
10.1093/oso/9780198714835.001.0001
[10] Q. Fan, et al., Vitar: Vision transformer with any resolution, arXiv, (2024). https://doi.org/10.48550/arXiv.
2403.18361
[11] S. S. W. Fatima, A. Rahimi, A review of time-series forecasting algorithms for industrial manufacturing systems,
Machines, 12(6) (2024), 380. https://doi.org/10.3390/machines12060380
[12] Y. Guo, et al., A novel fuzzy relative-position-coding transformer for breast cancer diagnosis using ultrasonography,
Healthcare, 11(18) (2023). https://doi.org/10.3390/healthcare11182530
[13] X. Han, et al., Are KANs effective for multivariate time series forecasting?, arXiv, (2024). https://doi.org/10.
48550/arXiv.2408.11306
[14] W. He, J. Zhe, T. Xiao, Z. Xu, Y. Li, A survey on uncertainty quantification methods for deep learning, ACM
Computing Surveys, 58(7) (2026), 1-35. https://doi.org/10.1145/3786319
[15] J. S. R. Jang, ANFIS: Adaptive-network-based fuzzy inference system, IEEE Transactions on Systems, Man, and
Cybernetics, 23(3) (1993), 665-685. https://doi.org/10.1109/21.256541
[16] L. Jiang, et al., KANMixer: Can KAN serve as a new modeling core for long-term time series forecasting?, arXiv,
(2025). https://doi.org/10.48550/arXiv.2508.01575
[17] M. Khodarahmi, V. Maihami, A review on Kalman filter models, Archives of Computational Methods in Engineering,
30(1) (2023), 727-747. https://doi.org/10.1007/s11831-022-09815-7
[18] V. I. Kontopoulou, et al., A review of ARIMA vs. machine learning approaches for time series forecasting in data
driven networks, Future Internet, 15(8) (2023), 255. https://doi.org/10.3390/fi15080255
[19] S. Li, Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting, Proceedings of the 33rd International Conference on Neural Information Processing Systems, (2019), 5243-5253.
https://doi.org/10.48550/arXiv.1907.00235
[20] Z. Li, Kolmogorov-Arnold networks are radial basis function networks, arXiv, (2024). https://doi.org/10.48550/
arXiv.2405.06721
[21] Y. Liu, et al., iTransformer: Inverted transformers are effective for time series forecasting, arXiv, (2023). https:
//doi.org/10.48550/arXiv.2310.06625
[22] Z. Liu, et al., Kan 2.0: Kolmogorov-Arnold networks meet science, arXiv, (2024). https://doi.org/10.48550/
arXiv.2408.10205
[23] Z. Liu, Y. Wang, et al., Kan: Kolmogorov-Arnold networks, arXiv, (2024). https://doi.org/10.48550/arXiv.
2404.19756
[24] Y. Luo, et al., Pathohr: Breast cancer survival prediction on high-resolution pathological images, arXiv, (2025).
https://doi.org/10.48550/arXiv.2503.17970
[25] R. Mohammadi Farsani, E. Pazouki, A transformer self-attention model for time series forecasting, Journal of
Electrical and Computer Engineering Innovations, (2021). https://doi.org/10.22061/jecei.2020.7426.391
[26] J. Morlier, M. L´eonardon, V. Gripon, Input resolution downsizing as a compression technique for vision deep
learning systems, arXiv, (2025). https://doi.org/10.48550/arXiv.2504.03749
[27] M. Pirani, et al., A comparative analysis of ARIMA, GRU, LSTM and BiLSTM on financial time series forecasting,
2022 IEEE International Conference on Distributed Computing and Electrical Circuits and Electronics (ICDCECE), (2022), 1-6. https://doi.org/10.1109/ICDCECE53908.2022.9793213
[28] M. B. A. Rabbani, et al., A comparison between seasonal autoregressive integrated moving average (SARIMA) and
exponential smoothing (ES) based on time series model for forecasting road accidents, Arabian Journal for Science
and Engineering, 46(11) (2021), 11113-11138. https://doi.org/10.1007/s13369-021-05650-3
[29] D. Ren, Q. Hu, T. Zhang, EKLT: Kolmogorov-Arnold attention-driven LSTM with transformer model for river
water level prediction, Journal of Hydrology, 649 (2025). https://doi.org/10.1016/j.jhydrol.2024.132430
[30] L. Ren, T. Zhao, H. Wang, FDformer: A fuzzy dynamic transformer-based network for efficient industrial time
series prediction, IEEE Transactions on Fuzzy Systems, 33(7) (2025). https://doi.org/10.1109/TFUZZ.2025.
3549920
[31] X. Shi, J. Wang, B. Zhang, A fuzzy time series forecasting model with both accuracy and interpretability is used to
forecast wind power, Applied Energy, 353 (2024). https://doi.org/10.1016/j.apenergy.2023.122015
[32] S. Shreyas, M. Akshath, ViKANformer: Embedding kolmogorov arnold networks in vision transformers for patternbased learning, arXiv, (2025). https://doi.org/10.48550/arXiv.2503.01124
[33] S. Singh, Neuro-fuzzy architectures for interpretable AI: A comprehensive survey and research outlook, Journal of
Machine Learning Research, 1(11) (2025). https://doi.org/10.20944/preprints202506.1173.v1
[34] W. Sulandari, S. Suhartono, S. S. Saleh, P. C. Rodrigues, Exponential smoothing on modeling and forecasting
multiple seasonal time series: An overview, Fluctuation and Noise Letters, 20(04) (2021). https://doi.org/10.
1142/S0219477521300032
[35] A. Vaswani, et al., Attention is all you need, NIPS’17: Proceedings of the 31st International Conference on Neural
Information Processing Systems, (2017), 6000-6010. https://doi.org/10.48550/arXiv.1706.03762
[36] W. Wang, J. Shao, H. Jumahong, Fuzzy inference-based LSTM for long-term time series prediction, Scientific
Reports, 13(1) (2023). https://doi.org/10.1038/s41598-023-47812-3
[37] P. Wang, K. Wang, Y. Song, X. Wang, AutoLDT: A lightweight spatio-temporal decoupling transformer framework
with AutoML method for time series classification, Scientific Reports, (2024). https://dx.doi.org/10.2139/
ssrn.4884435
[38] M. Waqas, U. W. Humphries, A critical review of RNN and LSTM variants in hydrological time series predictions,
MethodsX, 13 (2024). https://doi.org/10.1016/j.mex.2024.102946
[39] H. Wu, J. Xu, J. Wang, M. Long, Autoformer: Decomposition transformers with auto-correlation for long-term
series forecasting, NIPS’21: Proceedings of the 35th International Conference on Neural Information Processing
Systems, (2021), 22419-22430. https://doi.org/10.48550/arXiv.2106.13008
[40] Y. Xie, P. Zhang, Y. Chen, A fuzzy ARIMA correction model for transport volume forecast, Mathematical Problems
in Engineering, 2021(1) (2021). https://doi.org/10.1155/2021/6655102
[41] X. Yang, X. Wang, Kolmogorov-Arnold transformer, arXiv, (2024). https://doi.org/10.48550/arXiv.2409.
10594
[42] P. Yu, H. Kong, Z. Li, Wavelet-enhanced transformer for adaptive multi-period time series forecasting, Applied
Sciences, 15(23) (2025), 12698. https://doi.org/10.3390/app152312698
[43] N. Yuqi, et al., A time series is worth 64Words: Long-term forecasting with transformers, arXiv, (2022). https:
//doi.org/10.48550/arXiv.2211.14730
[44] L. A. Zadeh, Fuzzy sets, Information and Control, 8(3) (1965), 338-353. https://dx.doi.org/10.1016/
S0019-9958(65)90241-X
[45] Y. Zhang, J. Yan, Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series
forecasting, International Conference on Learning Representations, (2023). https://api.semanticscholar.org/
CorpusID:259298223
[46] H. Zhou, et al., Informer: Beyond efficient transformer for long sequence time-series forecasting, Proceedings
of the AAAI Conference on Artificial Intelligence, 35 (2021), 11106-11115. https://doi.org/10.48550/arXiv.
2012.07436