基于投影奖励机制的多机器人协同编队与避障

葛星, 秦丽, 沙瀛

doi:10.3969/j.issn.0255-8297.2024.01.014

应用科学学报 >

2024 , Vol. 42 >Issue 1: 174 - 188

DOI: https://doi.org/10.3969/j.issn.0255-8297.2024.01.014

计算机应用专辑

基于投影奖励机制的多机器人协同编队与避障

展开

1. 华中农业大学信息学院, 湖北武汉 430070;
2. 湖北省农业大数据工程技术研究中心, 湖北武汉 430070

收稿日期: 2023-06-29

网络出版日期: 2024-02-02

基金资助

国家自然科学基金（No. 62272188）；国家社会科学基金一般项目（No. 19BSH022）；中央高校基本科研业务费项目（No. 2662022XXYJ001, No. 2662022JC004, No. 2662021JC008, No. 2662023XXPY005）资助

收起

Projected Reward for Multi-robot Formation and Obstacle Avoidance

Expand

1. College of Informatics, Huazhong Agricultural University, Wuhan 430070, Hubei, China;
2. Hubei Engineering Technology Research Center of Agricultural Big Data, Wuhan 430070, Hubei, China

Received date: 2023-06-29

Online published: 2024-02-02

Fold

摘要

针对多机器人协同编队任务中过度中心化、系统鲁棒性低、编队稳定性较差等问题，提出了基于投影奖励机制的多机器人协同编队与避障（projected reward for multi-robotformation and obstacle avoidance,PRMFO）模型，实现了多机器人基于统一状态表征方法的去中心化决策过程。设计了一种多机器人统一状态表征方法，实现了机器人与外界环境交互信息处理的一致性；基于统一状态表征设计了基于投影的奖励机制，从距离和方向两个维度将奖励过程矢量化，丰富机器人的决策依据；为了解决多机器人系统中过度中心化问题，设置了自主决策层，融合统一状态表征与投影奖励机制的软演员评论家（soft actor-critic,SAC）算法，实现了多机器人协同编队与避障任务。在机器人操作系统（robot operating system,ROS）环境下进行仿真实验，实验数据表明PRMFO模型在单机器人平均回报值、成功率以及时间等指标上分别提高42%、8%、9%，基于PRMFO模型的多机器人编队误差控制在0~0.06范围内，实现了较高精度的多机器人编队。

关键词： 深度强化学习; 多机器人协同; 编队与避障; 投影奖励

本文引用格式

葛星, 秦丽, 沙瀛 . 基于投影奖励机制的多机器人协同编队与避障[J]. 应用科学学报, 2024 , 42(1) : 174 -188 . DOI: 10.3969/j.issn.0255-8297.2024.01.014

Abstract

To address issues of excessive centralization, low system robustness, and formation instability in multi-robot formation tasks, this paper introduces the projected reward for multi-robot formation and obstacle avoidance (PRMFO) approach. PRMFO achieves decentralized decision-making for multi-robot using a unified state representation method, ensuring consistency in processing information regarding interactions between robots and the external environment. The projected reward mechanism, based on this unified state representation, enhances the decision-making foundation by vectorizing rewards in both distance and direction dimensions. To mitigate excessive centralization, an autonomous decision layer is established by integrating the soft actor-critic (SAC) algorithm with uniform state representation and the projected reward mechanism. Simulation results in the robot operating system (ROS) environment demonstrate that PRMFO enhances average return, success rate, and time metrics by 42%, 8%, and 9%, respectively. Moreover, PRMFO keeps the multi-robot formation error within the range of 0 to 0.06, achieving a high level of accuracy.

Key words： deep reinforcement learning; cooperative multi-robot; formation and obstacle avoidance; projected reward

参考文献

[1] Velasco M G J, Melendez A M. Multi-robot motion coordination based on swing propagation [C]//The Seventh Mexican International Conference on Computer Science, 2006: 44-54.
[2] Vig L, Adams J A. Multi-robot coalition formation [J]. IEEE Transactions on Robotics, 2006, 22(4): 637-649.
[3] Dorigo M, Maniezzo V, Colorni A. Ant system: optimization by a colony of cooperating agents [J]. Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 1996, 26(1): 29-41.
[4] Yu J J, Lavalle S M. Optimal multirobot path planning on graphs: complete algorithms and effective heuristics [J]. IEEE Transactions on Robotics, 2016, 32(5): 1163-1177.
[5] Godoy J E, Karamouzas I, Guy S J, et al. Implicit coordination in crowded multi-agent navigation [C]//AAAI Conference on Artificial Intelligence, 2016, 30(1): 2487-2493.
[6] Kwon J W, Chwa D. Hierarchical formation control based on a vector field method for wheeled mobile robots [J]. IEEE Transactions on Robotics, 2012, 28(6): 1335-1345.
[7] Wang P K C. Navigation strategies for multiple autonomous mobile robots moving in formation [J]. Journal of Robotic Systems, 1991, 8(2): 177-195.
[8] Lewis M A, Tan K H. High precision formation control of mobile robots using virtual structures [J]. Autonomous Robots, 1997, 4(4): 387-403.
[9] Lee G, Chwa D. Decentralized behavior-based formation control of multiple robots considering obstacle avoidance [J]. Intelligent Service Robotics, 2018, 11(1): 127-138.
[10] Nazarahari M, Khanmirza E, Doostie S. Multi-objective multi-robot path planning in continuous environment using an enhanced genetic algorithm [J]. Expert Systems with Applications, 2019, 115: 106-120.
[11] Castillo O, Trujillo L, Melin P. Multiple objective genetic algorithms for path-planning optimization in autonomous mobile robots [J]. Soft Computing, 2007, 11(3): 269-279.
[12] Sutton R S, Barto A G. Reinforcement learning: an introduction [M]. Cambridge: MIT Press, 2018: 1-13.
[13] Lecun Y, Bengio Y, Hinton G. Deep learning [J]. Nature, 2015, 521(7553): 436-444.
[14] Long P X, Fan T X, Liao X Y, et al. Towards optimally decentralized multi-robot collision avoidance via deep reinforcement learning [C]//IEEE International Conference on Robotics and Automation (ICRA), 2018: 6252-6259.
[15] Mousavi S S, Schukat M, Howley E. Deep reinforcement learning: an overview [C]//SAI Intelligent Systems Conference, 2018: 426-440.
[16] Dilokthanakul N, Kaplanis C, Pawlowski N, et al. Feature control as intrinsic motivation for hierarchical reinforcement learning [J]. IEEE Transactions on Neural Networks and Learning Systems, 2019, 30(11): 3409-3418.
[17] Niroui F, Zhang K C, Kashino Z, et al. Deep reinforcement learning robot for search and rescue applications: exploration in unknown cluttered environments [J]. IEEE Robotics and Automation Letters, 2019, 4(2): 610-617.
[18] Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning [J]. Nature, 2015, 518(7540): 529-533.
[19] Everett M, Chen Y F, How J P. Motion planning among dynamic, decision-making agents with deep reinforcement learning [C]//IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018: 3052-3059.
[20] Fan T X, Long P X, Liu W X, et al. Distributed multi-robot collision avoidance via deep reinforcement learning for navigation in complex scenarios [J]. The International Journal of Robotics Research, 2020, 39(7): 856-892.
[21] Han R H, Chen S D, Wang S J, et al. Reinforcement learned distributed multi-robot navigation with reciprocal velocity obstacle shaped rewards [J]. IEEE Robotics and Automation Letters, 2022, 7(3): 5896-5903.
[22] Bai C C, Yan P, Pan W, et al. Learning-based multi-robot formation control with obstacle avoidance [J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(8): 11811-11822.
[23] Zhang Z, Wang X H, Zhang Q R, et al. Multi-robot cooperative pursuit via potential field-enhanced reinforcement learning [C]//IEEE International Conference on Robotics and Automation (ICRA), 2022: 8808-8814.
[24] Xin J, Zhao H, Liu D, et al. Application of deep reinforcement learning in mobile robot path planning [C]//Chinese Automation Congress (CAC), 2017: 7112-7116.
[25] Chen Y F, Liu M, Everett M, et al. Decentralized non-communicating multiagent collision avoidance with deep reinforcement learning [C]//IEEE International Conference on Robotics and Automation (ICRA), 2017: 285-292.
[26] 李永迪, 李彩虹, 张耀玉, 等. 基于改进SAC算法的移动机器人路径规划[J]. 计算机应用, 2023, 43(2): 654-660. Li Y D, Li C H, Zhang Y Y, et al. Mobile robot path planning based on improved SAC algorithm [J]. Journal of Computer Applications, 2023, 43(2): 654-660.(in Chinese)
[27] Yang Y, Li J T, Peng L L. Multi-robot path planning based on a deep reinforcement learning DQN algorithm [J]. CAAI Transactions on Intelligence Technology, 2020, 5(3): 177-183.
[28] Haarnoja T, Zhou A, Abbeel P, et al. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor [DB/OL]. 2018[2023-06-29]. https://arxiv.org/abs/1801.01290.pdf
[29] Imambi S, Prakash K B, Kanagachidambaresan G R. PyTorch [M]//Programming with tensorflow. Cham: Springer, 2021: 87-104.

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献