Page 59 - Read Online
P. 59

Qi et al. Intell Robot 2021;1(1):18-57  I http://dx.doi.org/10.20517/ir.2021.02      Page 54



                   2017;2017:70–76.
               25.  Taylor ME. Teaching reinforcement learning with mario: an argument and case study. In: Second AAAI Symposium on Educational
                   Advances in Artificial Intelligence; 2011. Available from: https://www.aaai.org/ocs/index.php/EAAI/EAAI11/paper/viewPaper/3515.
               26.  Holcomb SD, Porter WK, Ault SV, Mao G, Wang J. Overview on deepmind and its alphago zero ai. In: Proceedings of the 2018
                   international conference on big data and education; 2018. pp. 67–71.
               27.  Watkins CJ, Dayan P. Q­learning. Mach Learn 1992;8:279–92. Available from: https://link.springer.com/content/pdf/10.1007/BF
                   00992698.pdf.
               28.  Thorpe TL. Vehicle traffic light control using sarsa. In: Online]. Available: citeseer. ist. psu. edu/thorpe97vehicle. html. Citeseer; 1997.
                   Available from: https://citeseer.ist.psu.edu/thorpe97vehicle.html.
               29.  Silver D, Lever G, Heess N, et al. Deterministic policy gradient algorithms. In: Xing EP, Jebara T, editors. Proceedings of the
                   31st International Conference on Machine Learning. vol. 32 of Proceedings of Machine Learning Research. Bejing, China: PMLR;
                   2014. pp. 387–95. Available from: https://proceedings.mlr.press/v32/silver14.html.
               30.  Williams RJ. Simple statistical gradient­following algorithms for connectionist reinforcement learning. Mach Learn
                   1992;8:229–56.
               31.  Konda VR, Tsitsiklis JN. Actor­critic algorithms. In: advances in neural information processing systems; 2000. pp. 1008–14. Available
                   from: https://proceedings.neurips.cc/paper/1786­actor­critic­algorithms.pdf.
               32.  Henderson P, Islam R, Bachman P, et al. Deep reinforcement learning that matters. In: Proceedings of the AAAI conference on
                   artificial intelligence. vol. 32; 2018. Available from: https://ojs.aaai.org/index.php/AAAI/article/view/11694.
               33.  Lei L, Tan Y, Dahlenburg G, Xiang W, Zheng K. Dynamic energy dispatch based on deep reinforcement learning in IoT­Driven smart
                   isolated microgrids. IEEE Internet Things 2021;8:7938–53.
               34.  Lei L, Xu H, Xiong X, Zheng K, Xiang W, et al. Multiuser resource control with deep reinforcement learning in IoT edge computing.
                   IEEE Internet Things 2019;6:10119–33.
               35.  Ohnishi S, Uchibe E, Yamaguchi Y, et al. Constrained deep q­learning gradually approaching ordinary q­learning. Front
                   Neurorobotics 2019;13:103.
               36.  Peng J, Williams RJ. Incremental multi­step Q­learning. In: machine learning proceedings 1994. Elsevier; 1994. pp. 226–32.
               37.  Mnih V, Kavukcuoglu K, Silver D, et al. Human­level control through deep reinforcement learning. Nature 2015;518:529–33.
               38.  Lei L, Tan Y, Zheng K, et al. Deep reinforcement learning for autonomous internet of things: model, applications and challenges.
                   IEEE Communications Surveys Tutorials 2020;22:1722–60.
               39.  Van Hasselt H, Guez A, Silver D. Deep reinforcement learning with double q­learning. In: Proceedings of the AAAI conference on
                   artificial intelligence. vol. 30; 2016. Available from: https://ojs.aaai.org/index.php/AAAI/article/view/10295.
               40.  Schaul T, Quan J, Antonoglou I, Silver D. Prioritized experience replay. arXiv preprint arXiv:151105952 2015. Available from:
                   https://arxiv.org/abs/1511.05952.
               41.  Gu S, Lillicrap TP, Ghahramani Z, Turner RE, Levine S. Q­Prop: sample­efficient policy gradient with an off­policy critic. CoRR
                   2016;abs/1611.02247. Available from: http://arxiv.org/abs/1611.02247.
               42.  Haarnoja T, Zhou A, Abbeel P, Levine S. Soft actor­critic: off­policy maximum entropy deep reinforcement learning with a stochas­ tic
                   actor. In: Dy J, Krause A, editors. Proceedings of the 35th International Conference on Machine Learning. vol. 80 of Proceedings of
                   Machine Learning Research. PMLR; 2018. pp. 1861–70. Available from: https://proceedings.mlr.press/v80/haarnoja18b.html.
               43.  Mnih V, Badia AP, Mirza M, et al. Asynchronous methods for deep reinforcement learning. In: Balcan MF, Weinberger KQ, editors.
                   Proceedings of The 33rd International Conference on Machine Learning. vol. 48 of Proceedings of Machine Learning Research. New
                   York, New York, USA: PMLR; 2016. pp. 1928–37. Available from: https://proceedings.mlr.press/v48/mniha1 6.html.
               44.  Lillicrap TP, Hunt JJ, Pritzel A, et al.  Continuous control with deep reinforcement learning.  arXiv preprint arXiv:150902971
                   2015. Available from: https://arxiv.org/abs/1509.02971.
               45.  Barth­Maron G, Hoffman MW, Budden D, et al. Distributed distributional deterministic policy gradients. CoRR 2018;abs/
                   1804.08617. Available from: http://arxiv.org/abs/1804.08617.
               46.  Fujimoto S, van Hoof H, Meger D. Addressing function approximation error in actor­critic methods. In: Dy J, Krause A, editors.
                   Proceedings of the 35th International Conference on Machine Learning. vol. 80 of Proceedings of Machine Learning Research. PMLR;
                   2018. pp. 1587–96. Available from: https://proceedings.mlr.press/v80/fujimoto18a.html.
               47.  Schulman J, Levine S, Abbeel P, Jordan M, Moritz P. Trust region policy optimization. In: Bach F, Blei D, editors. Proceedings of the
                   32nd International Conference on Machine Learning. vol. 37 of Proceedings of Machine Learning Research. Lille, France: PMLR; 2015.
                   pp. 1889–97. Available from: https://proceedings.mlr.press/v37/schulman15.html.
               48.  Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms. CoRR 2017;abs/1707.06347.
                   Available from: http://arxiv.org/abs/1707.06347.
               49.  Zhu P, Li X, Poupart P. On improving deep reinforcement learning for POMDPs. CoRR 2017;abs/1704.07978. Available from:
                   http://arxiv.org/abs/1704.07978.
               50.  Hausknecht M, Stone P. Deep recurrent q­learning for partially observable mdps. In: 2015 aaai fall symposium series; 2015. Available
                   from: https://www.aaai.org/ocs/index.php/FSS/FSS15/paper/viewPaper/11673.
               51.  Heess N, Hunt JJ, Lillicrap TP, Silver D. Memory­based control with recurrent neural networks. CoRR 2015;abs/1512.04455. Available
   54   55   56   57   58   59   60   61   62   63   64