[1] R. Abdelkawy, W. M. Abdelmoez, A. Shoukry, A synchronous deep reinforcement learning model for automated
multi-stock trading, Progress in Artificial Intelligence, 10(1) (2021), 83-97. https://doi.org/10.1007/
s13748-020-00225-z
[2] S. Almahdi, S. Y. Yang, An adaptive portfolio trading system: A risk-return portfolio optimization using recurrent
reinforcement learning with expected maximum drawdown, Expert Systems with Applications, 87 (2017), 267-279.
https://doi.org/10.1016/j.eswa.2017.06.023
[3] C. Betancourt, W. H. Chen, Deep reinforcement learning for portfolio management of markets with a dynamic
number of assets, Expert Systems with Applications, 164 (2021), 114002. https://doi.org/10.1016/j.eswa.
2020.114002
[4] A. Borodin, R. El-Yaniv, V. Gogan, Can we learn to beat the best stock?, Journal of Artificial Intelligence Research,
21 (2004), 579-594.
https://doi.org/10.1613/jair.1336
[5] E. Chong, C. Han, F. C. Park, Deep learning networks for stock market analysis and prediction: Methodology, data
representations, and case studies, Expert Systems with Applications, 83 (2017), 187-205. https://doi.org/10.
1016/j.eswa.2017.04.030
[6] C. D. S. B. Costa, A. H. R. Costa, POE: A general portfolio optimization environment for FinRL, BrazilianWorkshop
on Artificial Intelligence in Finance (BWAIF), (2023), 132-143.
https://doi.org/10.5753/bwaif.2023.231144
[7] T. M. Cover, Universal portfolios, Mathematical Finance, 1(1) (1991), 1-29. https://doi.org/10.1111/j.
1467-9965.1991.tb00002.x
[8] P. Das, A. Banerjee, Meta optimization and its application to portfolio selection, Proceedings of the 17th ACM
SIGKDD International Conference on Knowledge Discovery and Data Mining, (2011), 1163-1171. https://doi.
org/10.1145/2020408.2020588
[9] Y. Deng, F. Bao, Y. Kong, Z. Ren, Q. Dai, Deep direct reinforcement learning for financial signal representation
and trading, IEEE Transactions on Neural Networks and Learning Systems, 28(3) (2016), 653-664. https://doi.
org/10.1109/TNNLS.2016.2522401
[11] L. Gy¨orfi, G. Lugosi, F. Udina, Nonparametric kernel-based sequential investment strategies, Mathematical Finance:
An International Journal of Mathematics, Statistics and Financial Economics, 16(2) (2006), 337-357. https://doi.
org/10.1111/j.1467-9965.2006.00274.x
[12] Z. Hao, H. Zhang, Y. Zhang, Stock portfolio management by using fuzzy ensemble deep reinforcement learning algorithm, Journal of Risk and Financial Management, 16(3) (2023), 201.
https://doi.org/10.3390/jrfm16030201
[13] J. B. Heaton, N. G. Polson, J. H. Witte, Deep learning for finance: Deep portfolios, Applied Stochastic Models in
Business and Industry, 33(1) (2017), 3-12.
https://doi.org/10.1002/asmb.2209
[14] D. P. Helmbold, R. E. Schapire, Y. Singer, M. K. Warmuth, On-line portfolio selection using multiplicative updates,
Mathematical Finance, 8(4) (1998), 325-347.
https://doi.org/10.1111/1467-9965.00058
[15] S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Computation, 9(8) (1997), 1735-1780. https:
//doi.org/10.1162/neco.1997.9.8.1735
[16] G. Jeong, H. Y. Kim, Improving financial trading decisions using deep Q-learning: Predicting the number of
shares, action strategies, and transfer learning, Expert Systems with Applications, 117 (2019), 125-138. https:
//doi.org/10.1016/j.eswa.2018.09.036
[21] P. Koratamaddi, K. Wadhwani, M. Gupta, S. G. Sanjeevi, Market sentiment-aware deep reinforcement learning
approach for stock portfolio allocation, Engineering Science and Technology, an International Journal, 24(4) (2021),
848-859.
https://doi.org/10.1016/j.jestch.2021.01.007
[22] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition, Proceedings
of the IEEE, 86(11) (1998), 2278-2324.
https://doi.org/10.1109/5.726791
[25] B. Li, S. C. Hoi, P. Zhao, V. Gopalkrishnan, Confidence weighted mean reversion strategy for on-line portfolio
selection, Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, JMLR
Workshop and Conference Proceedings, (2011), 434-442.
https://doi.org/10.1145/2435209.2435213
[28] T. Lillicrap, Continuous control with deep reinforcement learning, arXiv Preprint arXiv:1509.02971, (2015). https:
//doi.org/10.48550/arXiv.1509.02971
[29] C. T. Lin, C. M. Yeh, S. F. Liang, J. F. Chung, N. Kumar, Support-vector-based fuzzy neural network for pattern
classification, IEEE Transactions on Fuzzy Systems, 14(1) (2006), 31-41. https://doi.org/10.1109/TFUZZ.2005.
861604
[30] Y. Liu, D. Mikriukov, O. C. Tjahyadi, G. Li, T. R. Payne, Y. Yue, K. Siddique, K. L. Man, Revolutionizing
financial portfolio management: The non-stationary transformer’s fusion of macroeconomic indicators and sentiment
analysis in a deep reinforcement learning framework, Applied Sciences, 14(1) (2023). https://doi.org/10.3390/
app14010274
[31] X. Y. Liu, H. Yang, Q. Chen, R. Zhang, L. Yang, B. Xiao, C. D. Wang, FinRL: A deep reinforcement learning
library for automated stock trading in quantitative finance, arXiv Preprint arXiv:2011.09607, (2020). https://doi.
org/10.48550/arXiv.2011.09607
[34] H. M. Markowitz, Portfolio selection, The Journal of Finance, 7 (1952), 77-91. https://doi.org/10.1111/j.
1540-6261.1952.tb01525.x
[35] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, K. Kavukcuoglu, Asynchronous
methods for deep reinforcement learning, International Conference on Machine Learning, (2016). https://doi.org/
10.48550/arXiv.1602.01783
[36] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, L. Antonoglou, D. Wierstra, M. Riedmiller, Playing atari with deep
reinforcement learning, arXiv Preprint arXiv:1312.5602, (2013).
https://doi.org/10.48550/arXiv.1312.5602
[39] M. Rasoulzadeh, S. A. Edalatpanah, M. Fallah, S. E. Najafi, A hybrid model for choosing the optimal stock
portfolio under intuitionistic fuzzy sets, Iranian Journal of Fuzzy Systems, 21(2) (2025), 161-179. https://doi.
org/10.22111/ijfs.2024.45118.7968
[40] M. Rezaei, H. Nezamabadi-Pour, A taxonomy of literature reviews and experimental study of deepreinforcement
learning in portfolio management, Artificial Intelligence Review, 58(3) (2025), 1-46. https://doi.org/10.1007/
s10462-024-11066-w
[41] D. E. Rumelhart, G. E. Hinton, R. J. Williams, Learning representations by back-propagating errors, Nature,
323(6088) (1986), 533-536.
https://doi.org/10.1038/323533a0
[44] S. Shi, J. Li, G. Li, P. Pan, Q. Chen, Q. Sun, GPM: A graph convolutional network based reinforcement learning
framework for portfolio management, Neurocomputing, 498 (2022), 14-27. https://doi.org/10.1016/j.neucom.
2022.04.105
[45] D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A.
Bolton, Y. Chen, T. Lillicrap, F. Hui, L. Sifre, G. V. D. Riessche, T. Graepel, D. Hassabis, Mastering the game of
go without human knowledge, Nature, 550(7676) (2017), 354-359.
https://doi.org/10.1038/nature24270
[46] F. Soleymani, E. Paquet, Financial portfolio optimization with online deep reinforcement learning and restricted
stacked autoencoder-DeepBreath, Expert Systems with Applications, 156 (2020), 113456. https://doi.org/10.
1016/j.eswa.2020.113456
[47] F. Soleymani, E. Paquet, Deep graph convolutional reinforcement learning for financial portfolio managementdeeppocket, Expert Systems with Applications, 182 (2021), 115127. https://doi.org/10.1016/j.eswa.2021.
115127
[49] R. S. Sutton, A. G. Barto, Reinforcement learning: An introduction, Cambridge: MIT Press, 1(1) (2018). https:
//doi.org/10.1017/S0263574799271172
[50] R. S. Sutton, D. A. McAllester, S. P. Singh, Y. Mansour, Policy gradient methods for reinforcement learning
with function approximation, Advances in Neural Information Processing Systems (NeurIPS), 13 (1999), 1057-1063.
https://dl.acm.org/doi/10.5555/3009657.3009806
[51] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin, Attention
is all you need, Advances in Neural Information Processing Systems (NeurIPS), 30 (2017), 6000-6010. https:
//dl.acm.org/doi/10.5555/3295222.3295349
[53] C. J. Watkins, P. Dayan, Q-learning, Machine Learning, 8(3-4) (1992), 279-292. https://doi.org/10.1007/
BF00992698
[54] Y. Wu, E. Mansimov, R. B. Grosse, S. Liao, J. Ba, Scalable trust-region method for deep reinforcement learning
using kronecker-factored approximation, Advances in Neural Information Processing Systems (NeurIPS), 30 (2017),
5285-5294.
https://dl.acm.org/doi/10.5555/3295222.3295280
[55] X. Wu, D. A. Ralescu, Y. Liu, A new quadratic deviation of fuzzy random variable and its application to portfolio
optimization, Iranian Journal of Fuzzy Systems, 17(3) (2020), 1-18.
https://doi.org/10.22111/ijfs.2020.5344
[56] D. Wu, X. Wang, J. Su, B. Tang, S. Wu, A labeling method for financial time series prediction based on trends,
Entropy, 22(10) (2020), 1162.
https://doi.org/10.3390/e22101162
[57] K. Xu, Y. Zhang, D. Ye, P. Zhao, M. Tan, Relation-aware transformer for portfolio policy learning, Proceedings
of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, (2021),
4647-4653.
https://doi.org/10.24963/ijcai.2020/641
[58] H. Yang, X. Y. Liu, S. Zhong, A. Walid, Deep reinforcement learning for automated stock trading: An ensemble
strategy, Proceedings of the First ACM International Conference on AI in Finance (ICAIF), 20 (2020). https:
//doi.org/10.1145/3383455.3422540
[60] L. A. Zadeh, Fuzzy sets, Information and Control, 8(3) (1965), 338-353. https://doi.org/10.1016/S0019-9958%
2865%2990241-X
[61] Y. Zhang, P. Zhao, B. Li, Q. Wu, J. Huang, M. Tan, Cost-sensitive portfolio selection via deep reinforcement
learning, IEEE Transactions on Knowledge and Data Engineering, 34(1) (2020). https://doi.org/10.1109/TKDE.
2020.2979700
[62] T. Zhao, X. Ma, X. Li, C. Zhang, Asset correlation based deep reinforcement learning for the portfolio selection,
Expert Systems with Applications, 221 (2023), 119707.
https://doi.org/10.1016/j.eswa.2023.119707
[63] https://github.com/AI4Finance-Foundation/FinRL/blob/master/examples/FinRL_
PortfolioOptimizationEnv_Demo.ipynb