To address issues of excessive centralization, low system robustness, and formation instability in multi-robot formation tasks, this paper introduces the projected reward for multi-robot formation and obstacle avoidance (PRMFO) approach. PRMFO achieves decentralized decision-making for multi-robot using a unified state representation method, ensuring consistency in processing information regarding interactions between robots and the external environment. The projected reward mechanism, based on this unified state representation, enhances the decision-making foundation by vectorizing rewards in both distance and direction dimensions. To mitigate excessive centralization, an autonomous decision layer is established by integrating the soft actor-critic (SAC) algorithm with uniform state representation and the projected reward mechanism. Simulation results in the robot operating system (ROS) environment demonstrate that PRMFO enhances average return, success rate, and time metrics by 42%, 8%, and 9%, respectively. Moreover, PRMFO keeps the multi-robot formation error within the range of 0 to 0.06, achieving a high level of accuracy.
[1] Velasco M G J, Melendez A M. Multi-robot motion coordination based on swing propagation [C]//The Seventh Mexican International Conference on Computer Science, 2006: 44-54.
[2] Vig L, Adams J A. Multi-robot coalition formation [J]. IEEE Transactions on Robotics, 2006, 22(4): 637-649.
[3] Dorigo M, Maniezzo V, Colorni A. Ant system: optimization by a colony of cooperating agents [J]. Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 1996, 26(1): 29-41.
[4] Yu J J, Lavalle S M. Optimal multirobot path planning on graphs: complete algorithms and effective heuristics [J]. IEEE Transactions on Robotics, 2016, 32(5): 1163-1177.
[5] Godoy J E, Karamouzas I, Guy S J, et al. Implicit coordination in crowded multi-agent navigation [C]//AAAI Conference on Artificial Intelligence, 2016, 30(1): 2487-2493.
[6] Kwon J W, Chwa D. Hierarchical formation control based on a vector field method for wheeled mobile robots [J]. IEEE Transactions on Robotics, 2012, 28(6): 1335-1345.
[7] Wang P K C. Navigation strategies for multiple autonomous mobile robots moving in formation [J]. Journal of Robotic Systems, 1991, 8(2): 177-195.
[8] Lewis M A, Tan K H. High precision formation control of mobile robots using virtual structures [J]. Autonomous Robots, 1997, 4(4): 387-403.
[9] Lee G, Chwa D. Decentralized behavior-based formation control of multiple robots considering obstacle avoidance [J]. Intelligent Service Robotics, 2018, 11(1): 127-138.
[10] Nazarahari M, Khanmirza E, Doostie S. Multi-objective multi-robot path planning in continuous environment using an enhanced genetic algorithm [J]. Expert Systems with Applications, 2019, 115: 106-120.
[11] Castillo O, Trujillo L, Melin P. Multiple objective genetic algorithms for path-planning optimization in autonomous mobile robots [J]. Soft Computing, 2007, 11(3): 269-279.
[12] Sutton R S, Barto A G. Reinforcement learning: an introduction [M]. Cambridge: MIT Press, 2018: 1-13.
[13] Lecun Y, Bengio Y, Hinton G. Deep learning [J]. Nature, 2015, 521(7553): 436-444.
[14] Long P X, Fan T X, Liao X Y, et al. Towards optimally decentralized multi-robot collision avoidance via deep reinforcement learning [C]//IEEE International Conference on Robotics and Automation (ICRA), 2018: 6252-6259.
[15] Mousavi S S, Schukat M, Howley E. Deep reinforcement learning: an overview [C]//SAI Intelligent Systems Conference, 2018: 426-440.
[16] Dilokthanakul N, Kaplanis C, Pawlowski N, et al. Feature control as intrinsic motivation for hierarchical reinforcement learning [J]. IEEE Transactions on Neural Networks and Learning Systems, 2019, 30(11): 3409-3418.
[17] Niroui F, Zhang K C, Kashino Z, et al. Deep reinforcement learning robot for search and rescue applications: exploration in unknown cluttered environments [J]. IEEE Robotics and Automation Letters, 2019, 4(2): 610-617.
[18] Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning [J]. Nature, 2015, 518(7540): 529-533.
[19] Everett M, Chen Y F, How J P. Motion planning among dynamic, decision-making agents with deep reinforcement learning [C]//IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018: 3052-3059.
[20] Fan T X, Long P X, Liu W X, et al. Distributed multi-robot collision avoidance via deep reinforcement learning for navigation in complex scenarios [J]. The International Journal of Robotics Research, 2020, 39(7): 856-892.
[21] Han R H, Chen S D, Wang S J, et al. Reinforcement learned distributed multi-robot navigation with reciprocal velocity obstacle shaped rewards [J]. IEEE Robotics and Automation Letters, 2022, 7(3): 5896-5903.
[22] Bai C C, Yan P, Pan W, et al. Learning-based multi-robot formation control with obstacle avoidance [J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(8): 11811-11822.
[23] Zhang Z, Wang X H, Zhang Q R, et al. Multi-robot cooperative pursuit via potential field-enhanced reinforcement learning [C]//IEEE International Conference on Robotics and Automation (ICRA), 2022: 8808-8814.
[24] Xin J, Zhao H, Liu D, et al. Application of deep reinforcement learning in mobile robot path planning [C]//Chinese Automation Congress (CAC), 2017: 7112-7116.
[25] Chen Y F, Liu M, Everett M, et al. Decentralized non-communicating multiagent collision avoidance with deep reinforcement learning [C]//IEEE International Conference on Robotics and Automation (ICRA), 2017: 285-292.
[26] 李永迪, 李彩虹, 张耀玉, 等. 基于改进SAC算法的移动机器人路径规划[J]. 计算机应用, 2023, 43(2): 654-660. Li Y D, Li C H, Zhang Y Y, et al. Mobile robot path planning based on improved SAC algorithm [J]. Journal of Computer Applications, 2023, 43(2): 654-660.(in Chinese)
[27] Yang Y, Li J T, Peng L L. Multi-robot path planning based on a deep reinforcement learning DQN algorithm [J]. CAAI Transactions on Intelligence Technology, 2020, 5(3): 177-183.
[28] Haarnoja T, Zhou A, Abbeel P, et al. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor [DB/OL]. 2018[2023-06-29]. https://arxiv.org/abs/1801.01290.pdf
[29] Imambi S, Prakash K B, Kanagachidambaresan G R. PyTorch [M]//Programming with tensorflow. Cham: Springer, 2021: 87-104.