Integrating Fuzzy Logic with Deep Reinforcement Learning to Enhance Financial Portfolio Management

Document Type : Research Paper

Authors

1 Intelligent Data Processing Laboratory (IDPL), Department of Electrical Engineering, Shahid Bahonar University of Kerman,Kerman,Iran

2 University Of Kerman, Kerman, Iran

Abstract

Portfolio management is a challenging task due to the uncertainty and volatility in financial markets, making precise asset allocation and return maximization difficult. This paper presents a novel deep reinforcement learning (DRL) approach enhanced with fuzzy trend indicators to improve portfolio decision-making. The model was developed using a DRL framework, where a convolutional neural network (CNN)-based policy network learns to optimize asset allocations through interactions with the market. Fuzzy trend indicators are incorporated as additional input features, enabling the model to better capture market uncertainties and ambiguous trends. By providing a more flexible representation of market conditions, fuzzy trend indicators allow the model to dynamically adjust portfolio allocations in response to changing trends, leading to more precise asset allocation decisions and enhanced portfolio performance. The proposed model was trained and evaluated on historical stock data from the Brazilian stock market, covering the period from 2011 to 2020. The dataset includes daily high, low, and closing prices, ensuring a strong foundation for model training and validation. Experimental results show that the fuzzy-enhanced model outperforms some state-of-the-art strategies in terms of both returns and adaptability to volatile market conditions.

Keywords

Main Subjects


[1] R. Abdelkawy, W. M. Abdelmoez, A. Shoukry, A synchronous deep reinforcement learning model for automated
multi-stock trading, Progress in Artificial Intelligence, 10(1) (2021), 83-97. https://doi.org/10.1007/
s13748-020-00225-z
[2] S. Almahdi, S. Y. Yang, An adaptive portfolio trading system: A risk-return portfolio optimization using recurrent
reinforcement learning with expected maximum drawdown, Expert Systems with Applications, 87 (2017), 267-279.
https://doi.org/10.1016/j.eswa.2017.06.023
[3] C. Betancourt, W. H. Chen, Deep reinforcement learning for portfolio management of markets with a dynamic
number of assets, Expert Systems with Applications, 164 (2021), 114002. https://doi.org/10.1016/j.eswa.
2020.114002
[4] A. Borodin, R. El-Yaniv, V. Gogan, Can we learn to beat the best stock?, Journal of Artificial Intelligence Research,
21 (2004), 579-594. https://doi.org/10.1613/jair.1336
[5] E. Chong, C. Han, F. C. Park, Deep learning networks for stock market analysis and prediction: Methodology, data
representations, and case studies, Expert Systems with Applications, 83 (2017), 187-205. https://doi.org/10.
1016/j.eswa.2017.04.030
[6] C. D. S. B. Costa, A. H. R. Costa, POE: A general portfolio optimization environment for FinRL, BrazilianWorkshop
on Artificial Intelligence in Finance (BWAIF), (2023), 132-143. https://doi.org/10.5753/bwaif.2023.231144
[7] T. M. Cover, Universal portfolios, Mathematical Finance, 1(1) (1991), 1-29. https://doi.org/10.1111/j.
1467-9965.1991.tb00002.x
[8] P. Das, A. Banerjee, Meta optimization and its application to portfolio selection, Proceedings of the 17th ACM
SIGKDD International Conference on Knowledge Discovery and Data Mining, (2011), 1163-1171. https://doi.
org/10.1145/2020408.2020588
[9] Y. Deng, F. Bao, Y. Kong, Z. Ren, Q. Dai, Deep direct reinforcement learning for financial signal representation
and trading, IEEE Transactions on Neural Networks and Learning Systems, 28(3) (2016), 653-664. https://doi.
org/10.1109/TNNLS.2016.2522401
[10] F. D. Freitas, A. F. de Souza, A. R. de Almeida, Prediction-based portfolio optimization model using neural networks,
Neurocomputing, 72(10-12) (2009), 2155-2170. https://doi.org/10.1016/j.neucom.2008.08.019
[11] L. Gy¨orfi, G. Lugosi, F. Udina, Nonparametric kernel-based sequential investment strategies, Mathematical Finance:
An International Journal of Mathematics, Statistics and Financial Economics, 16(2) (2006), 337-357. https://doi.
org/10.1111/j.1467-9965.2006.00274.x
[12] Z. Hao, H. Zhang, Y. Zhang, Stock portfolio management by using fuzzy ensemble deep reinforcement learning algorithm, Journal of Risk and Financial Management, 16(3) (2023), 201. https://doi.org/10.3390/jrfm16030201
[13] J. B. Heaton, N. G. Polson, J. H. Witte, Deep learning for finance: Deep portfolios, Applied Stochastic Models in
Business and Industry, 33(1) (2017), 3-12. https://doi.org/10.1002/asmb.2209
[14] D. P. Helmbold, R. E. Schapire, Y. Singer, M. K. Warmuth, On-line portfolio selection using multiplicative updates,
Mathematical Finance, 8(4) (1998), 325-347. https://doi.org/10.1111/1467-9965.00058
[15] S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Computation, 9(8) (1997), 1735-1780. https:
//doi.org/10.1162/neco.1997.9.8.1735
[16] G. Jeong, H. Y. Kim, Improving financial trading decisions using deep Q-learning: Predicting the number of
shares, action strategies, and transfer learning, Expert Systems with Applications, 117 (2019), 125-138. https:
//doi.org/10.1016/j.eswa.2018.09.036
[17] Z. Jiang, J. Liang, Cryptocurrency portfolio management with deep reinforcement learning, 2017 Intelligent Systems
Conference (IntelliSys), IEEE, (2017), 905-913. https://doi.org/10.1109/IntelliSys.2017.8324237
[18] Y. Jiang, J. Olmo, M. Atwi, Deep reinforcement learning for portfolio selection, Global Finance Journal, 62 (2024),
101016. https://doi.org/10.1016/j.gfj.2024.101016
[19] Z. Jiang, D. Xu, J. Liang, A deep reinforcement learning framework for the financial portfolio management problem,
arXiv Preprint arXiv:1706.10059, (2017). https://doi.org/10.48550/arXiv.1706.10059
[20] J. L. Kelly, A new interpretation of information rate, The Bell System Technical Journal, 35(4) (1956), 917-926.
https://doi.org/10.1002/j.1538-7305.1956.tb03809.x
[21] P. Koratamaddi, K. Wadhwani, M. Gupta, S. G. Sanjeevi, Market sentiment-aware deep reinforcement learning
approach for stock portfolio allocation, Engineering Science and Technology, an International Journal, 24(4) (2021),
848-859. https://doi.org/10.1016/j.jestch.2021.01.007
[22] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition, Proceedings
of the IEEE, 86(11) (1998), 2278-2324. https://doi.org/10.1109/5.726791
[23] B. Li, S. C. Hoi, Online portfolio selection with moving average reversion, Proceedings of International Conference
on Machine Learning, (2012), 273-280. https://doi.org/10.48550/arXiv.1206.4626
[24] B. Li, S. C. Hoi, Online portfolio selection: A survey, ACM Computing Surveys (CSUR), 46(3) (2014), 1-36.
https://doi.org/10.1145/2512962
[25] B. Li, S. C. Hoi, P. Zhao, V. Gopalkrishnan, Confidence weighted mean reversion strategy for on-line portfolio
selection, Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, JMLR
Workshop and Conference Proceedings, (2011), 434-442. https://doi.org/10.1145/2435209.2435213
[26] B. Li, P. Zhao, S. C. Hoi, V. Gopalkrishnan, PAMR: Passive aggressive mean reversion strategy for portfolio
selection, Machine Learning, 87 (2012), 221-258. https://doi.org/10.1007/s10994-012-5281-z
[27] Z. Liang, H. Chen, J. Zhu, K. Jiang, Y. Li, Adversarial deep reinforcement learning in portfolio management, arXiv
Preprint arXiv:1808.09940, (2018). https://doi.org/10.48550/arXiv.1808.09940
[28] T. Lillicrap, Continuous control with deep reinforcement learning, arXiv Preprint arXiv:1509.02971, (2015). https:
//doi.org/10.48550/arXiv.1509.02971
[29] C. T. Lin, C. M. Yeh, S. F. Liang, J. F. Chung, N. Kumar, Support-vector-based fuzzy neural network for pattern
classification, IEEE Transactions on Fuzzy Systems, 14(1) (2006), 31-41. https://doi.org/10.1109/TFUZZ.2005.
861604
[30] Y. Liu, D. Mikriukov, O. C. Tjahyadi, G. Li, T. R. Payne, Y. Yue, K. Siddique, K. L. Man, Revolutionizing
financial portfolio management: The non-stationary transformer’s fusion of macroeconomic indicators and sentiment
analysis in a deep reinforcement learning framework, Applied Sciences, 14(1) (2023). https://doi.org/10.3390/
app14010274
[31] X. Y. Liu, H. Yang, Q. Chen, R. Zhang, L. Yang, B. Xiao, C. D. Wang, FinRL: A deep reinforcement learning
library for automated stock trading in quantitative finance, arXiv Preprint arXiv:2011.09607, (2020). https://doi.
org/10.48550/arXiv.2011.09607
[32] Y. Ma, R. Han, W. Wang, Prediction-based portfolio optimization models using deep neural networks, IEEE Access,
8 (2020), 115393-115405. https://doi.org/10.1109/ACCESS.2020.3003819
[33] C. Ma, J. Zhang, J. Liu, L. Ji, F. Gao, A parallel multi-module deep reinforcement learning algorithm for stock
trading, Neurocomputing, 449 (2021), 290-302. https://doi.org/10.1016/j.neucom.2021.04.005
[34] H. M. Markowitz, Portfolio selection, The Journal of Finance, 7 (1952), 77-91. https://doi.org/10.1111/j.
1540-6261.1952.tb01525.x
[35] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, K. Kavukcuoglu, Asynchronous
methods for deep reinforcement learning, International Conference on Machine Learning, (2016). https://doi.org/
10.48550/arXiv.1602.01783
[36] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, L. Antonoglou, D. Wierstra, M. Riedmiller, Playing atari with deep
reinforcement learning, arXiv Preprint arXiv:1312.5602, (2013). https://doi.org/10.48550/arXiv.1312.5602
[37] J. Moody, M. Saffell, Learning to trade via direct reinforcement, IEEE Transactions on Neural Networks, 12(4)
(2001), 875-889. https://doi.org/10.1109/72.935097
[38] H. Park, M. K. Sim, D. G. Choi, An intelligent financial portfolio trading strategy using deep Q-learning, Expert
Systems with Applications, 158 (2020), 113573. https://doi.org/10.1016/j.eswa.2020.113573
[39] M. Rasoulzadeh, S. A. Edalatpanah, M. Fallah, S. E. Najafi, A hybrid model for choosing the optimal stock
portfolio under intuitionistic fuzzy sets, Iranian Journal of Fuzzy Systems, 21(2) (2025), 161-179. https://doi.
org/10.22111/ijfs.2024.45118.7968
[40] M. Rezaei, H. Nezamabadi-Pour, A taxonomy of literature reviews and experimental study of deepreinforcement
learning in portfolio management, Artificial Intelligence Review, 58(3) (2025), 1-46. https://doi.org/10.1007/
s10462-024-11066-w
[41] D. E. Rumelhart, G. E. Hinton, R. J. Williams, Learning representations by back-propagating errors, Nature,
323(6088) (1986), 533-536. https://doi.org/10.1038/323533a0
[42] J. Schulman, S. Levine, P. Abbeel, M. Jordan, P. Moritz, Trust region policy optimization, International Conference
on Machine Learning, (2015). https://doi.org/10.48550/arXiv.1502.05477
[43] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov, Proximal policy optimization algorithms, arXiv
Preprint arXiv:1707.06347, (2017). https://doi.org/10.48550/arXiv.1707.06347
[44] S. Shi, J. Li, G. Li, P. Pan, Q. Chen, Q. Sun, GPM: A graph convolutional network based reinforcement learning
framework for portfolio management, Neurocomputing, 498 (2022), 14-27. https://doi.org/10.1016/j.neucom.
2022.04.105
[45] D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A.
Bolton, Y. Chen, T. Lillicrap, F. Hui, L. Sifre, G. V. D. Riessche, T. Graepel, D. Hassabis, Mastering the game of
go without human knowledge, Nature, 550(7676) (2017), 354-359. https://doi.org/10.1038/nature24270
[46] F. Soleymani, E. Paquet, Financial portfolio optimization with online deep reinforcement learning and restricted
stacked autoencoder-DeepBreath, Expert Systems with Applications, 156 (2020), 113456. https://doi.org/10.
1016/j.eswa.2020.113456
[47] F. Soleymani, E. Paquet, Deep graph convolutional reinforcement learning for financial portfolio managementdeeppocket, Expert Systems with Applications, 182 (2021), 115127. https://doi.org/10.1016/j.eswa.2021.
115127
[48] Q. Sun, X. Wei, X. Yang, GraphSAGE with deep reinforcement learning for financial portfolio optimization, Expert
Systems with Applications, 238 (2024), 122027. https://doi.org/10.1016/j.eswa.2023.122027
[49] R. S. Sutton, A. G. Barto, Reinforcement learning: An introduction, Cambridge: MIT Press, 1(1) (2018). https:
//doi.org/10.1017/S0263574799271172
[50] R. S. Sutton, D. A. McAllester, S. P. Singh, Y. Mansour, Policy gradient methods for reinforcement learning
with function approximation, Advances in Neural Information Processing Systems (NeurIPS), 13 (1999), 1057-1063.
https://dl.acm.org/doi/10.5555/3009657.3009806
[51] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin, Attention
is all you need, Advances in Neural Information Processing Systems (NeurIPS), 30 (2017), 6000-6010. https:
//dl.acm.org/doi/10.5555/3295222.3295349
[52] V. G. Vovk, Aggregating strategies, Proceedings of the Third AnnualWorkshop on Computational Learning Theory,
https://doi.org/10.1016/B978-1-55860-146-8.50032-1
[53] C. J. Watkins, P. Dayan, Q-learning, Machine Learning, 8(3-4) (1992), 279-292. https://doi.org/10.1007/
BF00992698
[54] Y. Wu, E. Mansimov, R. B. Grosse, S. Liao, J. Ba, Scalable trust-region method for deep reinforcement learning
using kronecker-factored approximation, Advances in Neural Information Processing Systems (NeurIPS), 30 (2017),
5285-5294. https://dl.acm.org/doi/10.5555/3295222.3295280
[55] X. Wu, D. A. Ralescu, Y. Liu, A new quadratic deviation of fuzzy random variable and its application to portfolio
optimization, Iranian Journal of Fuzzy Systems, 17(3) (2020), 1-18. https://doi.org/10.22111/ijfs.2020.5344
[56] D. Wu, X. Wang, J. Su, B. Tang, S. Wu, A labeling method for financial time series prediction based on trends,
Entropy, 22(10) (2020), 1162. https://doi.org/10.3390/e22101162
[57] K. Xu, Y. Zhang, D. Ye, P. Zhao, M. Tan, Relation-aware transformer for portfolio policy learning, Proceedings
of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, (2021),
4647-4653. https://doi.org/10.24963/ijcai.2020/641
[58] H. Yang, X. Y. Liu, S. Zhong, A. Walid, Deep reinforcement learning for automated stock trading: An ensemble
strategy, Proceedings of the First ACM International Conference on AI in Finance (ICAIF), 20 (2020). https:
//doi.org/10.1145/3383455.3422540
[59] H. Yun, M. Lee, Y. S. Kang, J. Seok, Portfolio management via two-stage deep learning with a joint cost, Expert
Systems with Applications, 143 (2020), 113041. https://doi.org/10.1016/j.eswa.2019.113041
[60] L. A. Zadeh, Fuzzy sets, Information and Control, 8(3) (1965), 338-353. https://doi.org/10.1016/S0019-9958%
2865%2990241-X
[61] Y. Zhang, P. Zhao, B. Li, Q. Wu, J. Huang, M. Tan, Cost-sensitive portfolio selection via deep reinforcement
learning, IEEE Transactions on Knowledge and Data Engineering, 34(1) (2020). https://doi.org/10.1109/TKDE.
2020.2979700
[62] T. Zhao, X. Ma, X. Li, C. Zhang, Asset correlation based deep reinforcement learning for the portfolio selection,
Expert Systems with Applications, 221 (2023), 119707. https://doi.org/10.1016/j.eswa.2023.119707
[63] https://github.com/AI4Finance-Foundation/FinRL/blob/master/examples/FinRL_
PortfolioOptimizationEnv_Demo.ipynb