Page 92 - Read Online

P. 92

Zhang et al. Intell Robot 2022;2(3):27597 I http://dx.doi.org/10.20517/ir.2022.20 Page 285

19. Sutton RS, Barto AG. Introduction to reinforcement learning 1998. Available from: https://login.cs.utexas.edu/sites/default/files/legacy
_files/research/documents/1%20intro%20up%20to%20RL%3ATD.pdf [Last accessed on 30 Aug 2022].
20. Schulman J, Levine S, Abbeel P, Jordan M, Moritz P. Trust region policy optimization. In: International conference on machine learning.
PMLR; 2015. pp. 1889–97. Available from: https://proceedings.mlr.press/v37/schulman15.html [Last accessed on 30 Aug 2022].
21. Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms. CoRR 2017;abs/1707.06347.
Available from: http://arxiv.org/abs/1707.06347 [Last accessed on 30 Aug 2022].
22. Schulman J, Moritz P, Levine S, Jordan MI, Abbeel P. Highdimensional continuous control using generalized advantage estimation. In:
Bengio Y, LeCun Y, editors. 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 24,
2016, Conference Track Proceedings; 2016. Available from: http://arxiv.org/abs/1506.02438 [Last accessed on 30 Aug 2022].
23. Mania H, Guy A, Recht B. Simple random search provides a competitive approach to reinforcement learning. CoRR 2018;abs/1803.07055.
Available from: http://arxiv.org/abs/1803.07055 [Last accessed on 30 Aug 2022].
24. Haarnoja T, Zhou A, Abbeel P, Levine S. Soft actorcritic: offpolicy maximum entropy deep reinforcement learning with a stochastic
actor. In: Dy JG, Krause A, editors. Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmäs
san, Stockholm, Sweden, July 1015, 2018. vol. 80 of Proceedings of Machine Learning Research. PMLR; 2018. pp. 1856–65. Avaialble
from: http://proceedings.mlr.press/v80/haarnoja18b.html [Last accessed on 30 Aug 2022].
25. Song HF, Abdolmaleki A, Springenberg JT, et al. VMPO: onpolicy maximum a posteriori policy optimization for discrete and contin
uous control. OpenReview.net; 2020. Available from: https://openreview.net/forum?id=SylOlp4FvH [Last accessed on 30 Aug 2022].
26. Abdolmaleki A, Huang SH, Hasenclever L, et al. A distributional view on multiobjective policy optimization. In: Proceedings of the
37th International Conference on Machine Learning, ICML 2020, 1318 July 2020, Virtual Event. vol. 119 of Proceedings of Machine
Learning Research. PMLR; 2020. pp. 11–22. Avaialble from: http://proceedings.mlr.press/v119/abdolmaleki20a.html [Last accessed on
30 Aug 2022].
27. Brakel P, Bohez S, Hasenclever L, Heess N, Bousmalis K. Learning coordinated terrainadaptive locomotion by imitating a centroidal
dynamics planner. CoRR 2021;abs/2111.00262. Avaialble from: https://arxiv.org/abs/2111.00262 [Last accessed on 30 Aug 2022].
28. Gangapurwala S, Mitchell AL, Havoutis I. Guided constrained policy optimization for dynamic quadrupedal robot locomotion. IEEE
Robotics Autom Lett 2020;5:3642–49. DOI
29. Chen X, Wang C, Zhou Z, Ross KW. Randomized ensembled double Qlearning: learning fast without a model. In: 9th International
Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 37, 2021. OpenReview.net; 2021. Avaialble from:
https://openreview.net/forum?id=AY8zfZm0tDd [Last accessed on 30 Aug 2022].
30. Smith L, Kew JC, Peng XB, et al. Legged robots that keep on learning: finetuning locomotion policies in the real world. In: 2022 IEEE
International Conference on Robotics and Automation (ICRA); 2022. pp. 1–7. DOI
31. Coumans E, Bai Y. PyBullet, a Python module for physics simulation for games, robotics and machine learning; 2016–2021. http:
//pybullet.org.
32. Hwangbo J, Lee J, Hutter M. PerContact Iteration Method for Solving Contact Dynamics. IEEE Robotics Autom Lett 2018;3:895–902.
Avaialble from: https://doi.org/10.1109/LRA.2018.2792536 [Last accessed on 30 Aug 2022].
33. Todorov E, Erez T, Tassa Y. MuJoCo: a physics engine for modelbased control. In: 2012 IEEE/RSJ International Conference on
Intelligent Robots and Systems. IEEE; 2012. pp. 5026–33. DOI
34. Makoviychuk V, Wawrzyniak L, Guo Y, et al. Isaac gym: high performance GPU based physics simulation for robot learning. In:
Vanschoren J, Yeung S, editors. Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS
Datasets and Benchmarks 2021, December 2021, virtual; 2021. Avaialble from: https://datasetsbenchmarksproceedings.neurips.cc/p
aper/2021/hash/28dd2c7955ce926456240b2ff0100bdeAbstractround2.html [Last accessed on 30 Aug 2022].
35. Rudin N, Hoeller D, Reist P, Hutter M. Learning to walk in minutes using massively parallel deep reinforcement learning. In: Conference
on Robot Learning. PMLR; 2022. pp. 91–100. Avaialble from: https://proceedings.mlr.press/v164/rudin22a.html [Last accessed on 30
Aug 2022].
36. Margolis GB, Yang G, Paigwar K, Chen T, Agrawal P. Rapid locomotion via reinforcement learning. arXiv preprint arXiv:220502824
2022. DOI
37. Escontrela A, Peng XB, Yu W, et al. Adversarial motion priors make good substitutes for complex reward functions. arXiv eprints
2022:arXiv:2203.15103. DOI
38. Vollenweider E, Bjelonic M, Klemm V, et al. Advanced skills through multiple adversarial motion priors in reinforcement learning. arXiv
eprints 2022:arXiv:2203.14912. DOI
39. Tan J, Zhang T, Coumans E, et al. Simtoreal: learning agile locomotion for quadruped robots. In: KressGazit H, Srinivasa SS, Howard
T, Atanasov N, editors. Robotics: Science and Systems XIV, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA, June 2630,
2018; 2018. Avaialble from: http://www.roboticsproceedings.org/rss14/p10.html [Last accessed on 30 Aug 2022].
40. Hutter M, Gehring C, Jud D, et al. Anymala highly mobile and dynamic quadrupedal robot. In: 2016 IEEE/RSJ international conference
on intelligent robots and systems (IROS). IEEE; 2016. pp. 38–44. DOI
41. Ha S, Xu P, Tan Z, Levine S, Tan J. Learning to walk in the real world with minimal human effort 2020;155:1110–20. Available from:
https://proceedings.mlr.press/v155/ha21c.html [Lasta accessed on 30 Aug 2022].
42. Gangapurwala S, Geisert M, Orsolino R, Fallon M, Havoutis I. Rloc: terrainaware legged locomotion using reinforcement learning and
optimal control. IEEE Trans Robot 2022. DOI
43. Peng XB, van de Panne M. Learning locomotion skills using DeepRL: does the choice of action space matter? In: Teran J, Zheng C,
Spencer SN, Thomaszewski B, Yin K, editors. Proceedings of the ACM SIGGRAPH / Eurographics Symposium on Computer Animation,

87 88 89 90 91 92 93 94 95 96 97