Page 92 - Read Online
P. 92

Zhang et al. Intell Robot 2022;2(3):275­97  I http://dx.doi.org/10.20517/ir.2022.20  Page 285


               19.  Sutton RS, Barto AG. Introduction to reinforcement learning 1998. Available from: https://login.cs.utexas.edu/sites/default/files/legacy
                   _files/research/documents/1%20intro%20up%20to%20RL%3ATD.pdf [Last accessed on 30 Aug 2022].
               20.  Schulman J, Levine S, Abbeel P, Jordan M, Moritz P. Trust region policy optimization. In: International conference on machine learning.
                   PMLR; 2015. pp. 1889–97. Available from: https://proceedings.mlr.press/v37/schulman15.html [Last accessed on 30 Aug 2022].
               21.  Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms. CoRR 2017;abs/1707.06347.
                   Available from: http://arxiv.org/abs/1707.06347 [Last accessed on 30 Aug 2022].
               22.  Schulman J, Moritz P, Levine S, Jordan MI, Abbeel P. High­dimensional continuous control using generalized advantage estimation. In:
                   Bengio Y, LeCun Y, editors. 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2­4,
                   2016, Conference Track Proceedings; 2016. Available from: http://arxiv.org/abs/1506.02438 [Last accessed on 30 Aug 2022].
               23.  Mania H, Guy A, Recht B. Simple random search provides a competitive approach to reinforcement learning. CoRR 2018;abs/1803.07055.
                   Available from: http://arxiv.org/abs/1803.07055 [Last accessed on 30 Aug 2022].
               24.  Haarnoja T, Zhou A, Abbeel P, Levine S. Soft actor­critic: off­policy maximum entropy deep reinforcement learning with a stochastic
                   actor. In: Dy JG, Krause A, editors. Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmäs­
                   san, Stockholm, Sweden, July 10­15, 2018. vol. 80 of Proceedings of Machine Learning Research. PMLR; 2018. pp. 1856–65. Avaialble
                   from: http://proceedings.mlr.press/v80/haarnoja18b.html [Last accessed on 30 Aug 2022].
               25.  Song HF, Abdolmaleki A, Springenberg JT, et al. V­MPO: on­policy maximum a posteriori policy optimization for discrete and contin­
                   uous control. OpenReview.net; 2020. Available from: https://openreview.net/forum?id=SylOlp4FvH [Last accessed on 30 Aug 2022].
               26.  Abdolmaleki A, Huang SH, Hasenclever L, et al. A distributional view on multi­objective policy optimization. In: Proceedings of the
                   37th International Conference on Machine Learning, ICML 2020, 13­18 July 2020, Virtual Event. vol. 119 of Proceedings of Machine
                   Learning Research. PMLR; 2020. pp. 11–22. Avaialble from: http://proceedings.mlr.press/v119/abdolmaleki20a.html [Last accessed on
                   30 Aug 2022].
               27.  Brakel P, Bohez S, Hasenclever L, Heess N, Bousmalis K. Learning coordinated terrain­adaptive locomotion by imitating a centroidal
                   dynamics planner. CoRR 2021;abs/2111.00262. Avaialble from: https://arxiv.org/abs/2111.00262 [Last accessed on 30 Aug 2022].
               28.  Gangapurwala S, Mitchell AL, Havoutis I. Guided constrained policy optimization for dynamic quadrupedal robot locomotion. IEEE
                   Robotics Autom Lett 2020;5:3642–49. DOI
               29.  Chen X, Wang C, Zhou Z, Ross KW. Randomized ensembled double Q­learning: learning fast without a model. In: 9th International
                   Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3­7, 2021. OpenReview.net; 2021. Avaialble from:
                   https://openreview.net/forum?id=AY8zfZm0tDd [Last accessed on 30 Aug 2022].
               30.  Smith L, Kew JC, Peng XB, et al. Legged robots that keep on learning: fine­tuning locomotion policies in the real world. In: 2022 IEEE
                   International Conference on Robotics and Automation (ICRA); 2022. pp. 1–7. DOI
               31.  Coumans E, Bai Y. PyBullet, a Python module for physics simulation for games, robotics and machine learning; 2016–2021. http:
                   //pybullet.org.
               32.  Hwangbo J, Lee J, Hutter M. Per­Contact Iteration Method for Solving Contact Dynamics. IEEE Robotics Autom Lett 2018;3:895–902.
                   Avaialble from: https://doi.org/10.1109/LRA.2018.2792536 [Last accessed on 30 Aug 2022].
               33.  Todorov E, Erez T, Tassa Y. MuJoCo: a physics engine for model­based control. In: 2012 IEEE/RSJ International Conference on
                   Intelligent Robots and Systems. IEEE; 2012. pp. 5026–33. DOI
               34.  Makoviychuk V, Wawrzyniak L, Guo Y, et al. Isaac gym: high performance GPU based physics simulation for robot learning. In:
                   Vanschoren J, Yeung S, editors. Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS
                   Datasets and Benchmarks 2021, December 2021, virtual; 2021. Avaialble from: https://datasets­benchmarks­proceedings.neurips.cc/p
                   aper/2021/hash/28dd2c7955ce926456240b2ff0100bde­Abstract­round2.html [Last accessed on 30 Aug 2022].
               35.  Rudin N, Hoeller D, Reist P, Hutter M. Learning to walk in minutes using massively parallel deep reinforcement learning. In: Conference
                   on Robot Learning. PMLR; 2022. pp. 91–100. Avaialble from: https://proceedings.mlr.press/v164/rudin22a.html [Last accessed on 30
                   Aug 2022].
               36.  Margolis GB, Yang G, Paigwar K, Chen T, Agrawal P. Rapid locomotion via reinforcement learning. arXiv preprint arXiv:220502824
                   2022. DOI
               37.  Escontrela A, Peng XB, Yu W, et al. Adversarial motion priors make good substitutes for complex reward functions. arXiv e­prints
                   2022:arXiv:2203.15103. DOI
               38.  Vollenweider E, Bjelonic M, Klemm V, et al. Advanced skills through multiple adversarial motion priors in reinforcement learning. arXiv
                   e­prints 2022:arXiv:2203.14912. DOI
               39.  Tan J, Zhang T, Coumans E, et al. Sim­to­real: learning agile locomotion for quadruped robots. In: Kress­Gazit H, Srinivasa SS, Howard
                   T, Atanasov N, editors. Robotics: Science and Systems XIV, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA, June 26­30,
                   2018; 2018. Avaialble from: http://www.roboticsproceedings.org/rss14/p10.html [Last accessed on 30 Aug 2022].
               40.  Hutter M, Gehring C, Jud D, et al. Anymal­a highly mobile and dynamic quadrupedal robot. In: 2016 IEEE/RSJ international conference
                   on intelligent robots and systems (IROS). IEEE; 2016. pp. 38–44. DOI
               41.  Ha S, Xu P, Tan Z, Levine S, Tan J. Learning to walk in the real world with minimal human effort 2020;155:1110–20. Available from:
                   https://proceedings.mlr.press/v155/ha21c.html [Lasta accessed on 30 Aug 2022].
               42.  Gangapurwala S, Geisert M, Orsolino R, Fallon M, Havoutis I. Rloc: terrain­aware legged locomotion using reinforcement learning and
                   optimal control. IEEE Trans Robot 2022. DOI
               43.  Peng XB, van de Panne M. Learning locomotion skills using DeepRL: does the choice of action space matter? In: Teran J, Zheng C,
                   Spencer SN, Thomaszewski B, Yin K, editors. Proceedings of the ACM SIGGRAPH / Eurographics Symposium on Computer Animation,
   87   88   89   90   91   92   93   94   95   96   97