Page 59 - Read Online
P. 59
Qi et al. Intell Robot 2021;1(1):18-57 I http://dx.doi.org/10.20517/ir.2021.02 Page 54
2017;2017:70–76.
25. Taylor ME. Teaching reinforcement learning with mario: an argument and case study. In: Second AAAI Symposium on Educational
Advances in Artificial Intelligence; 2011. Available from: https://www.aaai.org/ocs/index.php/EAAI/EAAI11/paper/viewPaper/3515.
26. Holcomb SD, Porter WK, Ault SV, Mao G, Wang J. Overview on deepmind and its alphago zero ai. In: Proceedings of the 2018
international conference on big data and education; 2018. pp. 67–71.
27. Watkins CJ, Dayan P. Qlearning. Mach Learn 1992;8:279–92. Available from: https://link.springer.com/content/pdf/10.1007/BF
00992698.pdf.
28. Thorpe TL. Vehicle traffic light control using sarsa. In: Online]. Available: citeseer. ist. psu. edu/thorpe97vehicle. html. Citeseer; 1997.
Available from: https://citeseer.ist.psu.edu/thorpe97vehicle.html.
29. Silver D, Lever G, Heess N, et al. Deterministic policy gradient algorithms. In: Xing EP, Jebara T, editors. Proceedings of the
31st International Conference on Machine Learning. vol. 32 of Proceedings of Machine Learning Research. Bejing, China: PMLR;
2014. pp. 387–95. Available from: https://proceedings.mlr.press/v32/silver14.html.
30. Williams RJ. Simple statistical gradientfollowing algorithms for connectionist reinforcement learning. Mach Learn
1992;8:229–56.
31. Konda VR, Tsitsiklis JN. Actorcritic algorithms. In: advances in neural information processing systems; 2000. pp. 1008–14. Available
from: https://proceedings.neurips.cc/paper/1786actorcriticalgorithms.pdf.
32. Henderson P, Islam R, Bachman P, et al. Deep reinforcement learning that matters. In: Proceedings of the AAAI conference on
artificial intelligence. vol. 32; 2018. Available from: https://ojs.aaai.org/index.php/AAAI/article/view/11694.
33. Lei L, Tan Y, Dahlenburg G, Xiang W, Zheng K. Dynamic energy dispatch based on deep reinforcement learning in IoTDriven smart
isolated microgrids. IEEE Internet Things 2021;8:7938–53.
34. Lei L, Xu H, Xiong X, Zheng K, Xiang W, et al. Multiuser resource control with deep reinforcement learning in IoT edge computing.
IEEE Internet Things 2019;6:10119–33.
35. Ohnishi S, Uchibe E, Yamaguchi Y, et al. Constrained deep qlearning gradually approaching ordinary qlearning. Front
Neurorobotics 2019;13:103.
36. Peng J, Williams RJ. Incremental multistep Qlearning. In: machine learning proceedings 1994. Elsevier; 1994. pp. 226–32.
37. Mnih V, Kavukcuoglu K, Silver D, et al. Humanlevel control through deep reinforcement learning. Nature 2015;518:529–33.
38. Lei L, Tan Y, Zheng K, et al. Deep reinforcement learning for autonomous internet of things: model, applications and challenges.
IEEE Communications Surveys Tutorials 2020;22:1722–60.
39. Van Hasselt H, Guez A, Silver D. Deep reinforcement learning with double qlearning. In: Proceedings of the AAAI conference on
artificial intelligence. vol. 30; 2016. Available from: https://ojs.aaai.org/index.php/AAAI/article/view/10295.
40. Schaul T, Quan J, Antonoglou I, Silver D. Prioritized experience replay. arXiv preprint arXiv:151105952 2015. Available from:
https://arxiv.org/abs/1511.05952.
41. Gu S, Lillicrap TP, Ghahramani Z, Turner RE, Levine S. QProp: sampleefficient policy gradient with an offpolicy critic. CoRR
2016;abs/1611.02247. Available from: http://arxiv.org/abs/1611.02247.
42. Haarnoja T, Zhou A, Abbeel P, Levine S. Soft actorcritic: offpolicy maximum entropy deep reinforcement learning with a stochas tic
actor. In: Dy J, Krause A, editors. Proceedings of the 35th International Conference on Machine Learning. vol. 80 of Proceedings of
Machine Learning Research. PMLR; 2018. pp. 1861–70. Available from: https://proceedings.mlr.press/v80/haarnoja18b.html.
43. Mnih V, Badia AP, Mirza M, et al. Asynchronous methods for deep reinforcement learning. In: Balcan MF, Weinberger KQ, editors.
Proceedings of The 33rd International Conference on Machine Learning. vol. 48 of Proceedings of Machine Learning Research. New
York, New York, USA: PMLR; 2016. pp. 1928–37. Available from: https://proceedings.mlr.press/v48/mniha1 6.html.
44. Lillicrap TP, Hunt JJ, Pritzel A, et al. Continuous control with deep reinforcement learning. arXiv preprint arXiv:150902971
2015. Available from: https://arxiv.org/abs/1509.02971.
45. BarthMaron G, Hoffman MW, Budden D, et al. Distributed distributional deterministic policy gradients. CoRR 2018;abs/
1804.08617. Available from: http://arxiv.org/abs/1804.08617.
46. Fujimoto S, van Hoof H, Meger D. Addressing function approximation error in actorcritic methods. In: Dy J, Krause A, editors.
Proceedings of the 35th International Conference on Machine Learning. vol. 80 of Proceedings of Machine Learning Research. PMLR;
2018. pp. 1587–96. Available from: https://proceedings.mlr.press/v80/fujimoto18a.html.
47. Schulman J, Levine S, Abbeel P, Jordan M, Moritz P. Trust region policy optimization. In: Bach F, Blei D, editors. Proceedings of the
32nd International Conference on Machine Learning. vol. 37 of Proceedings of Machine Learning Research. Lille, France: PMLR; 2015.
pp. 1889–97. Available from: https://proceedings.mlr.press/v37/schulman15.html.
48. Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms. CoRR 2017;abs/1707.06347.
Available from: http://arxiv.org/abs/1707.06347.
49. Zhu P, Li X, Poupart P. On improving deep reinforcement learning for POMDPs. CoRR 2017;abs/1704.07978. Available from:
http://arxiv.org/abs/1704.07978.
50. Hausknecht M, Stone P. Deep recurrent qlearning for partially observable mdps. In: 2015 aaai fall symposium series; 2015. Available
from: https://www.aaai.org/ocs/index.php/FSS/FSS15/paper/viewPaper/11673.
51. Heess N, Hunt JJ, Lillicrap TP, Silver D. Memorybased control with recurrent neural networks. CoRR 2015;abs/1512.04455. Available