Page 92 - Read Online
P. 92
Zhang et al. Intell Robot 2022;2(3):27597 I http://dx.doi.org/10.20517/ir.2022.20 Page 285
19. Sutton RS, Barto AG. Introduction to reinforcement learning 1998. Available from: https://login.cs.utexas.edu/sites/default/files/legacy
_files/research/documents/1%20intro%20up%20to%20RL%3ATD.pdf [Last accessed on 30 Aug 2022].
20. Schulman J, Levine S, Abbeel P, Jordan M, Moritz P. Trust region policy optimization. In: International conference on machine learning.
PMLR; 2015. pp. 1889–97. Available from: https://proceedings.mlr.press/v37/schulman15.html [Last accessed on 30 Aug 2022].
21. Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms. CoRR 2017;abs/1707.06347.
Available from: http://arxiv.org/abs/1707.06347 [Last accessed on 30 Aug 2022].
22. Schulman J, Moritz P, Levine S, Jordan MI, Abbeel P. Highdimensional continuous control using generalized advantage estimation. In:
Bengio Y, LeCun Y, editors. 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 24,
2016, Conference Track Proceedings; 2016. Available from: http://arxiv.org/abs/1506.02438 [Last accessed on 30 Aug 2022].
23. Mania H, Guy A, Recht B. Simple random search provides a competitive approach to reinforcement learning. CoRR 2018;abs/1803.07055.
Available from: http://arxiv.org/abs/1803.07055 [Last accessed on 30 Aug 2022].
24. Haarnoja T, Zhou A, Abbeel P, Levine S. Soft actorcritic: offpolicy maximum entropy deep reinforcement learning with a stochastic
actor. In: Dy JG, Krause A, editors. Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmäs
san, Stockholm, Sweden, July 1015, 2018. vol. 80 of Proceedings of Machine Learning Research. PMLR; 2018. pp. 1856–65. Avaialble
from: http://proceedings.mlr.press/v80/haarnoja18b.html [Last accessed on 30 Aug 2022].
25. Song HF, Abdolmaleki A, Springenberg JT, et al. VMPO: onpolicy maximum a posteriori policy optimization for discrete and contin
uous control. OpenReview.net; 2020. Available from: https://openreview.net/forum?id=SylOlp4FvH [Last accessed on 30 Aug 2022].
26. Abdolmaleki A, Huang SH, Hasenclever L, et al. A distributional view on multiobjective policy optimization. In: Proceedings of the
37th International Conference on Machine Learning, ICML 2020, 1318 July 2020, Virtual Event. vol. 119 of Proceedings of Machine
Learning Research. PMLR; 2020. pp. 11–22. Avaialble from: http://proceedings.mlr.press/v119/abdolmaleki20a.html [Last accessed on
30 Aug 2022].
27. Brakel P, Bohez S, Hasenclever L, Heess N, Bousmalis K. Learning coordinated terrainadaptive locomotion by imitating a centroidal
dynamics planner. CoRR 2021;abs/2111.00262. Avaialble from: https://arxiv.org/abs/2111.00262 [Last accessed on 30 Aug 2022].
28. Gangapurwala S, Mitchell AL, Havoutis I. Guided constrained policy optimization for dynamic quadrupedal robot locomotion. IEEE
Robotics Autom Lett 2020;5:3642–49. DOI
29. Chen X, Wang C, Zhou Z, Ross KW. Randomized ensembled double Qlearning: learning fast without a model. In: 9th International
Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 37, 2021. OpenReview.net; 2021. Avaialble from:
https://openreview.net/forum?id=AY8zfZm0tDd [Last accessed on 30 Aug 2022].
30. Smith L, Kew JC, Peng XB, et al. Legged robots that keep on learning: finetuning locomotion policies in the real world. In: 2022 IEEE
International Conference on Robotics and Automation (ICRA); 2022. pp. 1–7. DOI
31. Coumans E, Bai Y. PyBullet, a Python module for physics simulation for games, robotics and machine learning; 2016–2021. http:
//pybullet.org.
32. Hwangbo J, Lee J, Hutter M. PerContact Iteration Method for Solving Contact Dynamics. IEEE Robotics Autom Lett 2018;3:895–902.
Avaialble from: https://doi.org/10.1109/LRA.2018.2792536 [Last accessed on 30 Aug 2022].
33. Todorov E, Erez T, Tassa Y. MuJoCo: a physics engine for modelbased control. In: 2012 IEEE/RSJ International Conference on
Intelligent Robots and Systems. IEEE; 2012. pp. 5026–33. DOI
34. Makoviychuk V, Wawrzyniak L, Guo Y, et al. Isaac gym: high performance GPU based physics simulation for robot learning. In:
Vanschoren J, Yeung S, editors. Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS
Datasets and Benchmarks 2021, December 2021, virtual; 2021. Avaialble from: https://datasetsbenchmarksproceedings.neurips.cc/p
aper/2021/hash/28dd2c7955ce926456240b2ff0100bdeAbstractround2.html [Last accessed on 30 Aug 2022].
35. Rudin N, Hoeller D, Reist P, Hutter M. Learning to walk in minutes using massively parallel deep reinforcement learning. In: Conference
on Robot Learning. PMLR; 2022. pp. 91–100. Avaialble from: https://proceedings.mlr.press/v164/rudin22a.html [Last accessed on 30
Aug 2022].
36. Margolis GB, Yang G, Paigwar K, Chen T, Agrawal P. Rapid locomotion via reinforcement learning. arXiv preprint arXiv:220502824
2022. DOI
37. Escontrela A, Peng XB, Yu W, et al. Adversarial motion priors make good substitutes for complex reward functions. arXiv eprints
2022:arXiv:2203.15103. DOI
38. Vollenweider E, Bjelonic M, Klemm V, et al. Advanced skills through multiple adversarial motion priors in reinforcement learning. arXiv
eprints 2022:arXiv:2203.14912. DOI
39. Tan J, Zhang T, Coumans E, et al. Simtoreal: learning agile locomotion for quadruped robots. In: KressGazit H, Srinivasa SS, Howard
T, Atanasov N, editors. Robotics: Science and Systems XIV, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA, June 2630,
2018; 2018. Avaialble from: http://www.roboticsproceedings.org/rss14/p10.html [Last accessed on 30 Aug 2022].
40. Hutter M, Gehring C, Jud D, et al. Anymala highly mobile and dynamic quadrupedal robot. In: 2016 IEEE/RSJ international conference
on intelligent robots and systems (IROS). IEEE; 2016. pp. 38–44. DOI
41. Ha S, Xu P, Tan Z, Levine S, Tan J. Learning to walk in the real world with minimal human effort 2020;155:1110–20. Available from:
https://proceedings.mlr.press/v155/ha21c.html [Lasta accessed on 30 Aug 2022].
42. Gangapurwala S, Geisert M, Orsolino R, Fallon M, Havoutis I. Rloc: terrainaware legged locomotion using reinforcement learning and
optimal control. IEEE Trans Robot 2022. DOI
43. Peng XB, van de Panne M. Learning locomotion skills using DeepRL: does the choice of action space matter? In: Teran J, Zheng C,
Spencer SN, Thomaszewski B, Yin K, editors. Proceedings of the ACM SIGGRAPH / Eurographics Symposium on Computer Animation,