应用科学学报 ›› 2024, Vol. 42 ›› Issue (1): 174-188.doi: 10.3969/j.issn.0255-8297.2024.01.014

• 计算机应用专辑 • 上一篇    

基于投影奖励机制的多机器人协同编队与避障

葛星1,2, 秦丽1,2, 沙瀛1,2   

  1. 1. 华中农业大学信息学院, 湖北 武汉 430070;
    2. 湖北省农业大数据工程技术研究中心, 湖北 武汉 430070
  • 收稿日期:2023-06-29 出版日期:2024-01-30 发布日期:2024-02-02
  • 通信作者: 秦丽,研究方向为智能机器人、人工智能。E-mail:qinli@mail.hzau.edu.cn E-mail:qinli@mail.hzau.edu.cn
  • 基金资助:
    国家自然科学基金(No. 62272188);国家社会科学基金一般项目(No. 19BSH022);中央高校基本科研业务费项目(No. 2662022XXYJ001, No. 2662022JC004, No. 2662021JC008, No. 2662023XXPY005)资助

Projected Reward for Multi-robot Formation and Obstacle Avoidance

GE Xing1,2, QIN Li1,2, SHA Ying1,2   

  1. 1. College of Informatics, Huazhong Agricultural University, Wuhan 430070, Hubei, China;
    2. Hubei Engineering Technology Research Center of Agricultural Big Data, Wuhan 430070, Hubei, China
  • Received:2023-06-29 Online:2024-01-30 Published:2024-02-02

摘要: 针对多机器人协同编队任务中过度中心化、系统鲁棒性低、编队稳定性较差等问题,提出了基于投影奖励机制的多机器人协同编队与避障(projected reward for multi-robotformation and obstacle avoidance,PRMFO)模型,实现了多机器人基于统一状态表征方法的去中心化决策过程。设计了一种多机器人统一状态表征方法,实现了机器人与外界环境交互信息处理的一致性;基于统一状态表征设计了基于投影的奖励机制,从距离和方向两个维度将奖励过程矢量化,丰富机器人的决策依据;为了解决多机器人系统中过度中心化问题,设置了自主决策层,融合统一状态表征与投影奖励机制的软演员评论家(soft actor-critic,SAC)算法,实现了多机器人协同编队与避障任务。在机器人操作系统(robot operating system,ROS)环境下进行仿真实验,实验数据表明PRMFO模型在单机器人平均回报值、成功率以及时间等指标上分别提高42%、8%、9%,基于PRMFO模型的多机器人编队误差控制在0~0.06范围内,实现了较高精度的多机器人编队。

关键词: 深度强化学习, 多机器人协同, 编队与避障, 投影奖励

Abstract: To address issues of excessive centralization, low system robustness, and formation instability in multi-robot formation tasks, this paper introduces the projected reward for multi-robot formation and obstacle avoidance (PRMFO) approach. PRMFO achieves decentralized decision-making for multi-robot using a unified state representation method, ensuring consistency in processing information regarding interactions between robots and the external environment. The projected reward mechanism, based on this unified state representation, enhances the decision-making foundation by vectorizing rewards in both distance and direction dimensions. To mitigate excessive centralization, an autonomous decision layer is established by integrating the soft actor-critic (SAC) algorithm with uniform state representation and the projected reward mechanism. Simulation results in the robot operating system (ROS) environment demonstrate that PRMFO enhances average return, success rate, and time metrics by 42%, 8%, and 9%, respectively. Moreover, PRMFO keeps the multi-robot formation error within the range of 0 to 0.06, achieving a high level of accuracy.

Key words: deep reinforcement learning, cooperative multi-robot, formation and obstacle avoidance, projected reward

中图分类号: