Optimizing Deep Q-Networks with fuzzy inference-based adaptive replay buffer management

Document Type : Research Paper

Authors

1 Department of Computer Engineering, Lorestan University, Khorramabad, Iran

2 Department of Computer Engineering, Technical and Vocational University (TVU), Tehran, Iran.

Abstract

 Deep reinforcement learning algorithms, such as Deep Q-Networks (DQN), require careful tuning of replay memory pa
rameters. In standard DQN implementations, these parameters remain fixed, which conflicts with the dynamic nature
 of the learning process where environmental conditions and reward stability continuously change. This mismatch often
 results in unstable learning or slow convergence. In this paper, we present a fuzzy logic-based system for adaptively
 adjusting three key replay memory parameters: memory size, the ratio of recent samples, and priority weight. The
 proposed fuzzy system evaluates the agent’s state by monitoring reward variations and average training errors, and
 accordingly updates these parameters to maintain optimal values during training. To assess the effectiveness of the pro
posed approach, we compared it with conventional DQN and PER-DQN methods across three benchmark reinforcement
 learning environments: CartPole v1, LunarLander v2, and Taxi v3. Experimental and statistical analyses demonstrate
 that our method improves average rewards, reduces training time, and enhances learning stability.

Keywords


 [1] F. E. Alsaadi, et al., A new fuzzy reinforcement learning method for effective chemotherapy, Mathematics, 11(2)
 (2023), 477. https://doi.org/10.3390/math11020477
 [2] M. Annabestani, et al., A new soft computing method for integration of expert’s knowledge in reinforcement learning
 problems, arXiv, (2021). https://doi.org/10.48550/arXiv.2106.07088

 [3] C. D’Eramo, et al., Sharing knowledge in multi-task deep reinforcement learning, arXiv, (2024). https://doi.
 org/10.48550/arXiv.2401.09561
 [4] X. C. Han, et al., Attention ensemble mixture: A novel offline reinforcement learning algorithm for autonomous
 vehicles, Applied Intelligence, 55(6) (2025), 1-14. https://doi.org/10.1007/s10489-025-06403-7
 [5] U. Hwang, H. T. Lim, S. Hong, Tackling environment heterogeneity in federated reinforcement learning, 2025 IEEE
 Conference on Artificial Intelligence (CAI), IEEE, (2025). https://doi.org/10.1109/CAI64502.2025.00221
 [6] J. S. R. Jang, C. T. Sun, E. Mizutani, Neuro-fuzzy and soft computing: A computational approach to learning and
 machine intelligence, Upper Saddle River, NJ, USA: Prentice Hall, 1997. https://doi.org/10.1109/TAC.1997.
 633847
 [7] A. A. Khater, M. Fekry, M. El Bardini, A. M. El Nagar, Deep reinforcement learning based adaptive fuzzy con
trol for electro hydraulic servo system, Neural Computing and Applications, (2025). https://doi.org/10.1007/
 s00521-024-10741-x
 [8] L. J. Lin, Self-improving reactive agents based on reinforcement learning, planning and teaching, Machine Learning,
 8(3-4) (1992), 293-321. https://doi.org/10.1007/BF00992699
 [9] V. Mnih, et al., Human-level control through deep reinforcement learning, Nature, 518(7540) (2015), 529-533.
 https://doi.org/10.1038/nature14236
 [10] D. E. Neves, L. Ishitani, Z. K. G. P. Junior, Advances and challenges in learning from experience replay, Artificial
 Intelligence Review, 58(2) (2024), 1-54. https://doi.org/10.1007/s10462-024-11062-0
 [11] J. Parker-Holder, et al., Automated reinforcement learning (autorl): A survey and open problems, Journal of
 Artificial Intelligence Research, 74 (2022), 517-568. https://doi.org/10.48550/arXiv.2201.03916
 [12] J. F. Pettit, et al., Disco-dso: Coupling discrete and continuous optimization for efficient generative design in
 hybrid spaces, Proceedings of the AAAI Conference on Artificial Intelligence, 39(25) (2025). https://doi.org/
 10.48550/arXiv.2412.11051
 [13] T. Schaul, et al., Prioritized experience replay, arXiv, (2015). https://doi.org/10.48550/arXiv.1511.05952
 [14] T. Schaul, J. Quan, I. Antonoglou, D. Silver, Prioritized experience replay, Proceeding International Conference
 on Learning Representations (ICLR), arXive, (2016). https://doi.org/10.48550/arXiv.1511.05952
 [15] R. S. Sutton, A. G. Barto, Reinforcement learning: An introduction, 2nd ed. Cambridge, MA, USA: MIT Press,
 2018. https://web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook2ndEd.pdf
 [16] M. E. Taylor, P. Stone, Transfer learning for reinforcement learning domains: A survey, Journal of Machine
 Learning Research, 10 (2009), 1633-1685. https://www.jmlr.org/papers/volume10/taylor09a/taylor09a.pdf
 [17] H. Van Hasselt, A. Guez, D. Silver, Deep reinforcement learning with double Q-learning, Proceedings of the AAAI
 Conference on Artificial Intelligence, (2016), 2094-2100. https://doi.org/10.48550/arXiv.1509.06461
 [18] L. Wang, Y. Huang, Hierarchical reinforcement learning with curriculum learning and subpolicy transfer in navi
gation environments, International Conference on Intelligent Computing, Singapore: Springer Nature Singapore,
 (2025). https://doi.org/10.1007/978-981-96-9894-3_14
 [19] Z. Wang, T. Schaul, M. Hessel, H. Van Hasselt, M. Lanctot, N. De Freitas, Dueling network architectures for deep
 reinforcement learning, in Proceedings of the 33rd International Conference on Machine Learning (ICML), New
 York, NY, USA, (2016), 1995-2003. https://doi.org/10.48550/arXiv.1511.06581
 [20] K. Young, B. Wang, M. E. Taylor, Metatrace actor-critic: Online step-size tuning by meta-gradient descent for
 reinforcement learning control, arXiv, (2018). https://doi.org/10.48550/arXiv.1805.04514
 [21] L. A. Zadeh, Fuzzy sets, Information and Control, 8(3) (1965), 338-353. https://doi.org/10.1016/
 S0019-9958(65)90241-X
 [22] J. X. Zhan, et al., Accelerating deep reinforcement learning with fuzzy logic rules, International Conference on
 Industrial, Engineering and Other Applications of Applied Intelligent Systems, Cham: Springer Nature Switzerland,
 (2023). https://doi.org/10.1007/978-3-031-36822-6_23
 [23] T. Zhang, et al., Dynamics-adaptive continual reinforcement learning via progressive contextualization, IEEE Trans
actions on Neural Networks and Learning Systems, 35(10) (2023), 14588-14602. https://doi.org/10.1109/
 TNNLS.2023.3280085
 [24] B. Zhao, Z. Z. Liu, Z. Tang, KAFQN: Kolmogorov-Arnold fuzzy-guided Q-network in reinforcement learning,
 ICASSP 2025, IEEE, (2025). https://doi.org/10.1109/ICASSP49660.2025.10890744
 [25] L. L. Zhen, Value-based reinforcement learning, in Artificial Intelligence for Engineers: Basics and Implementations,
 Cham: Springer Nature Switzerland, (2025), 337-355. https://doi.org/10.1007/978-3-031-75953-6_14