Optimizing Deep Q-Networks with fuzzy inference-based adaptive replay buffer management

Dowlatshahi, M. B.; Beiranvand, S.

doi:10.22111/ijfs.2025.9358

Optimizing Deep Q-Networks with fuzzy inference-based adaptive replay buffer management

Document Type : Research Paper

Authors

M. B. Dowlatshahi ¹
S. Beiranvand ²

¹ Department of Computer Engineering, Lorestan University, Khorramabad, Iran

² Department of Computer Engineering, Technical and Vocational University (TVU), Tehran, Iran.

10.22111/ijfs.2025.9358

Abstract

Deep reinforcement learning algorithms, such as Deep Q-Networks (DQN), require careful tuning of replay memory pa
rameters. In standard DQN implementations, these parameters remain fixed, which conflicts with the dynamic nature
of the learning process where environmental conditions and reward stability continuously change. This mismatch often
results in unstable learning or slow convergence. In this paper, we present a fuzzy logic-based system for adaptively
adjusting three key replay memory parameters: memory size, the ratio of recent samples, and priority weight. The
proposed fuzzy system evaluates the agent’s state by monitoring reward variations and average training errors, and
accordingly updates these parameters to maintain optimal values during training. To assess the effectiveness of the pro
posed approach, we compared it with conventional DQN and PER-DQN methods across three benchmark reinforcement
learning environments: CartPole v1, LunarLander v2, and Taxi v3. Experimental and statistical analyses demonstrate
that our method improves average rewards, reduces training time, and enhances learning stability.

Keywords

References

[1] F. E. Alsaadi, et al., A new fuzzy reinforcement learning method for effective chemotherapy, Mathematics, 11(2)
(2023), 477. https://doi.org/10.3390/math11020477

[2] M. Annabestani, et al., A new soft computing method for integration of expert’s knowledge in reinforcement learning
problems, arXiv, (2021). https://doi.org/10.48550/arXiv.2106.07088

[3] C. D’Eramo, et al., Sharing knowledge in multi-task deep reinforcement learning, arXiv, (2024). https://doi.
org/10.48550/arXiv.2401.09561

[4] X. C. Han, et al., Attention ensemble mixture: A novel offline reinforcement learning algorithm for autonomous
vehicles, Applied Intelligence, 55(6) (2025), 1-14. https://doi.org/10.1007/s10489-025-06403-7

[5] U. Hwang, H. T. Lim, S. Hong, Tackling environment heterogeneity in federated reinforcement learning, 2025 IEEE
Conference on Artificial Intelligence (CAI), IEEE, (2025). https://doi.org/10.1109/CAI64502.2025.00221

[6] J. S. R. Jang, C. T. Sun, E. Mizutani, Neuro-fuzzy and soft computing: A computational approach to learning and
machine intelligence, Upper Saddle River, NJ, USA: Prentice Hall, 1997. https://doi.org/10.1109/TAC.1997.
633847

[7] A. A. Khater, M. Fekry, M. El Bardini, A. M. El Nagar, Deep reinforcement learning based adaptive fuzzy con
trol for electro hydraulic servo system, Neural Computing and Applications, (2025). https://doi.org/10.1007/
s00521-024-10741-x

[8] L. J. Lin, Self-improving reactive agents based on reinforcement learning, planning and teaching, Machine Learning,
8(3-4) (1992), 293-321. https://doi.org/10.1007/BF00992699

[9] V. Mnih, et al., Human-level control through deep reinforcement learning, Nature, 518(7540) (2015), 529-533.
https://doi.org/10.1038/nature14236

[10] D. E. Neves, L. Ishitani, Z. K. G. P. Junior, Advances and challenges in learning from experience replay, Artificial
Intelligence Review, 58(2) (2024), 1-54. https://doi.org/10.1007/s10462-024-11062-0

[11] J. Parker-Holder, et al., Automated reinforcement learning (autorl): A survey and open problems, Journal of
Artificial Intelligence Research, 74 (2022), 517-568. https://doi.org/10.48550/arXiv.2201.03916

[12] J. F. Pettit, et al., Disco-dso: Coupling discrete and continuous optimization for efficient generative design in
hybrid spaces, Proceedings of the AAAI Conference on Artificial Intelligence, 39(25) (2025). https://doi.org/
10.48550/arXiv.2412.11051

[13] T. Schaul, et al., Prioritized experience replay, arXiv, (2015). https://doi.org/10.48550/arXiv.1511.05952

[14] T. Schaul, J. Quan, I. Antonoglou, D. Silver, Prioritized experience replay, Proceeding International Conference
on Learning Representations (ICLR), arXive, (2016). https://doi.org/10.48550/arXiv.1511.05952

[15] R. S. Sutton, A. G. Barto, Reinforcement learning: An introduction, 2nd ed. Cambridge, MA, USA: MIT Press,
2018. https://web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook2ndEd.pdf

[16] M. E. Taylor, P. Stone, Transfer learning for reinforcement learning domains: A survey, Journal of Machine
Learning Research, 10 (2009), 1633-1685. https://www.jmlr.org/papers/volume10/taylor09a/taylor09a.pdf

[17] H. Van Hasselt, A. Guez, D. Silver, Deep reinforcement learning with double Q-learning, Proceedings of the AAAI
Conference on Artificial Intelligence, (2016), 2094-2100. https://doi.org/10.48550/arXiv.1509.06461

[18] L. Wang, Y. Huang, Hierarchical reinforcement learning with curriculum learning and subpolicy transfer in navi
gation environments, International Conference on Intelligent Computing, Singapore: Springer Nature Singapore,
(2025). https://doi.org/10.1007/978-981-96-9894-3_14

[19] Z. Wang, T. Schaul, M. Hessel, H. Van Hasselt, M. Lanctot, N. De Freitas, Dueling network architectures for deep
reinforcement learning, in Proceedings of the 33rd International Conference on Machine Learning (ICML), New
York, NY, USA, (2016), 1995-2003. https://doi.org/10.48550/arXiv.1511.06581

[20] K. Young, B. Wang, M. E. Taylor, Metatrace actor-critic: Online step-size tuning by meta-gradient descent for
reinforcement learning control, arXiv, (2018). https://doi.org/10.48550/arXiv.1805.04514

[21] L. A. Zadeh, Fuzzy sets, Information and Control, 8(3) (1965), 338-353. https://doi.org/10.1016/
S0019-9958(65)90241-X

[22] J. X. Zhan, et al., Accelerating deep reinforcement learning with fuzzy logic rules, International Conference on
Industrial, Engineering and Other Applications of Applied Intelligent Systems, Cham: Springer Nature Switzerland,
(2023). https://doi.org/10.1007/978-3-031-36822-6_23

[23] T. Zhang, et al., Dynamics-adaptive continual reinforcement learning via progressive contextualization, IEEE Trans
actions on Neural Networks and Learning Systems, 35(10) (2023), 14588-14602. https://doi.org/10.1109/
TNNLS.2023.3280085

[24] B. Zhao, Z. Z. Liu, Z. Tang, KAFQN: Kolmogorov-Arnold fuzzy-guided Q-network in reinforcement learning,
ICASSP 2025, IEEE, (2025). https://doi.org/10.1109/ICASSP49660.2025.10890744

[25] L. L. Zhen, Value-based reinforcement learning, in Artificial Intelligence for Engineers: Basics and Implementations,
Cham: Springer Nature Switzerland, (2025), 337-355. https://doi.org/10.1007/978-3-031-75953-6_14

Volume 22, Issue 4
July and August 2025
Pages 161-174

Article View: 424
PDF Download: 298

Optimizing Deep Q-Networks with fuzzy inference-based adaptive replay buffer management

References

Volume 22, Issue 4
July and August 2025
Pages 161-174

Files

Share

How to cite

Statistics

Optimizing Deep Q-Networks with fuzzy inference-based adaptive replay buffer management

References

Volume 22, Issue 4July and August 2025Pages 161-174

Files

Share

How to cite

Statistics

Volume 22, Issue 4
July and August 2025
Pages 161-174