[1] D. Allahverdy, A. Fakharian, M. B. Menhaj, Back-stepping integral sliding mode control with iterative learning
control algorithm for quadrotor UAVs, Journal of Electrical Engineering and Technology, 14(6) (2019), 2539-2547.
https://doi.org/10.1007/s42835-019-00257-z
[2] B. Andr´e, C. Anderson, Restricted gradient-descent algorithm for value-function approximation in reinforcement
learning, Artificial Intelligence, 172(4-5) (2008), 454-482.
https://doi.org/10.1016/j.artint.2007.08.001
[3] A. Barakat, P. Bianchi, J. Lehmann, Analysis of a target-based actor-critic algorithm with linear function approximation,
arXiv preprint arXiv:2106.07472, (2021).
https://doi.org/10.48550/arXiv.2106.07472
[4] D. P. Bertsekas, J. N. Tsitsiklis, Neuro-dynamic programming, Belmont, MA: Athena Scientific, 1996.
[5] L. Bu¸soniu, D. Ernst, B. De Schutter, R. Babuˇska, Online least-squares policy iteration for reinforcement learning
control, Proceedings of the 2010 American Control Conference, Baltimore, MD, USA, (2010), 486-491. https:
//doi.org/10.1109/ACC.2010.5530856
[6] L. Bu¸soniu, A. Lazaric, M. Ghavamzadeh, R. Munos, R. Babuˇska, B. De Schutter, Least-squares methods for policy
iteration, In: Wiering, M., van Otterlo, M. (Eds.), Reinforcement Learning: State-of-the-Art. In: Adaptation, Learning, and Optimization, 12, Springer, Heidelberg, Germany, (2012), 75-109. https://doi.org/10.1007/
978-3-642-27645-3-3
[7] Y. Cui, T. Matsubara, K. Sugimoto, Kernel dynamic policy programming: Applicable reinforcement learning to
robot systems with high dimensional states, Neural Networks, 94 (2017), 13-23. https://doi.org/10.1016/j.
neunet.2017.06.007
[8] V. Derhami, V. J. Majd, M. N. Ahmadabadi, Fuzzy sarsa learning and the proof of existence of its stationary
points, Asian Journal of Control, (2008), 535-549.
https://doi.org/10.1002/asjc.54
[9] Y. Duan, X. Chen, R. Houthooft, J. Schulman, P. Abbeel, Benchmarking deep reinforcement learning for continuous
control, arXiv preprint arXiv:1604.06778, (2016).
https://doi.org/10.48550/arXiv.1604.06778
[10] K. Fr¨amling, Light-weight reinforcement learning with function approximation for real-life control tasks, In Proceedings of 5th International Conference on Informatics in Control, Automation and Robotics, Funchal, Madeira,
Portugal, (2008), 127-134.
https://doi.org/10.5220/0001484001270134
[11] F. Ghorbani, V. Derhami, M. Afsharchi, Fuzzy least square policy iteration and its mathematical analysis, International
Journal of Fuzzy Systems, (2016), 1-14.
https://doi.org/10.1007/s40815-016-0270-1
[12] R. A. Howard, Dynamic programming and Markov processes, New York: Wiley, 1960.
[13] K. S. Hwang, S. W. Tan, M. C. Tsai, Reinforcement learning to adaptive control of nonlinear systems, IEEE Transactions
on Systems, Man, and Cybernetics-Part B, 33(3) (2003), 514-521. https://doi.org/10.1109/TSMCB.
2003.811112
[14] H. S. Jakab, L. Csat´o, Sparse approximations to value functions in reinforcement learning, In: Koprinkova-Hristova,
P., Mladenov, V., Kasabov, N. (eds) Artificial Neural Networks. Springer Series in Bio-/Neuroinformatics, 4,
Springer, Cham, (2015).
https://doi.org/10.1007/978-3-319-09903-3-14
[15] Y. Jia, X. Y. Zhou, Policy gradient and actor-critic learning in continuous time and space: Theory and algorithms,
Journal of Machine Learning Research, 23(275) (2022), 12603-12652.
https://doi.org/10.2139/ssrn.3969101
[17] Y. J. Liu, L. Tang, S. Tong, C. P. Chen, D. J. Li, Reinforcement learning design-based adaptive tracking control
with less learning parameters for nonlinear discrete-time MIMO systems, IEEE Transactions on Neural Networks
and Learning Systems, 26(1) (2015), 165-176.
https://doi.org/10.1109/TNNLS.2014.2360724
[18] I. Nishikawa, K. Matsunaga, An unsupervised learning of a layered network and its application to a motion acquisition,
Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN’02 (Cat. No.02CH37290),
Honolulu, HI, USA, 2 (2002), 1667-1672.
https://doi.org/10.1109/IJCNN.2002.1007768
[19] W. Rudin, Principles of mathematical analysis, 3rd ed. New York, NY, USA: McGraw-Hill Education, 1976.
[22] A. Sheikhlar, A. Fakharian, Online policy iteration-based tracking control of four wheeled omni-directional robots,
Journal of Dynamic Systems, Measurement, and Control, 140(8) (2018), 081017. https://doi.org/10.1115/1.
4039287
[23] J. Sherman, W. J. Morrison, Adjustment of an inverse matrix corresponding to a change in one element of a
given matrix, The Annals of Mathematical Statistics, 21(1) (1950), 124-127. https://doi.org/10.1214/aoms/
1177729893
[24] N. Snehal, W. Pooja, K. Sonam, S. R. Wagh, N. M. Singh, Control of an acrobot system using reinforcement
learning with probabilistic policy search, 2021 Australian and New Zealand Control Conference (ANZCC), Gold
Coast, Australia, (2021), 68-73.
https://doi.org/10.1109/ANZCC53563.2021.9628194
[25] E. H. Sumiea, S. J. Abdulkadir, H. Alhussian, S. M. Al-Selwi, A. Alqushaibi, M. G. Ragab, S. M. Fati, Deep
deterministic policy gradient algorithm: A systematic review, Heliyon, 10 (2024). https://doi.org/10.1016/j.
heliyon.2024.e30697
[26] R. S. Sutton, A. G. Bareto, Reinforcement learning: An introduction, Second Edition, MIT Press, Massachusetts,
2017.
[28] B. Varga, B. Kulcs´ar, M. H. Chehreghani, Deep Q-learning: A robust control approach, International Journal of
Robust and Nonlinear Control, 33(1) (2023), 526-554.
https://doi.org/10.1038/nature14236
[29] X. Xu, D. Hu, X. Lu, Kernel-based least squares policy iteration for reinforcement learning, In IEEE Transactions
on Neural Networks, 18(4) (2007), 973-992.
https://doi.org/10.1109/TNN.2007.899161
[30] X. Xu, C. Liu, S. X. Yang, D. Hu, Hierarchical approximate policy iteration with binary-tree state space decomposition,
In IEEE Transactions on Neural Networks, 22(12) (2011), 1863-1877. https://doi.org/10.1109/TNN.
2011.2168422
[31] S. Yahyaa, B. Manderick, Knowledge gradient for online reinforcement learning, In: Duval, B., van den Herik, J.,
Loiseau, S., Filipe, J. (Eds.), Agents and Artificial Intelligence. In: ICAART 2014 LNCS, 8946, Springer, Cham,
(2014), 103-118.
https://doi.org/10.1007/978-3-319-25210-0-7
[32] M. Zaki, A. Mohan, A. Goplan, S. Manner, Actor-critic based improper reinforcement learning, arXiv, (2022).
https://doi.org/10.48550/arXiv.2207.09090