应用科学学报 ›› 2025, Vol. 43 ›› Issue (3): 415-436.doi: 10.3969/j.issn.0255-8297.2025.03.005

• 计算机科学与应用 • 上一篇    

一种基于改进深度确定性策略梯度的移动机器人路径规划算法

张庆玲1, 倪翠1, 王朋1,2, 巩慧1   

  1. 1. 山东交通学院 信息科学与电气工程学院, 山东 济南 250357;
    2. 山东省科学院 自动化研究所, 山东 济南 250013
  • 收稿日期:2023-08-31 发布日期:2025-06-23
  • 通信作者: 倪翠,副教授,研究方向为数字影像处理。E-mail:emilync@126.com E-mail:emilync@126.com
  • 基金资助:
    中国博士后科学基金(No.2021M702030);山东省交通运输厅科技计划项目基金(No.2021B120)

A Path Planning Algorithm for Mobile Robots Based on an Improved Deep Deterministic Policy Gradient

ZHANG Qingling1, NI Cui1, WANG Peng1,2, GONG Hui1   

  1. 1. School of Information Science and Electric Engineering, Shandong Jiaotong University, Jinan 250357, Shandong, China;
    2. Institute of Automation, Shandong Academy of Sciences, Jinan 250013, Shandong, China
  • Received:2023-08-31 Published:2025-06-23

摘要: 深度确定性策略梯度(deep deterministic policy gradient,DDPG)算法采用ActorCritic框架结构,保证移动机器人运动的连续性。但Critic网络在计算值函数(Q值)时,没有充分考虑各种状态和动作的差异,导致Q值估计不准确;其次,DDPG奖励函数设置过于稀疏,容易导致模型训练时收敛慢;另外,随机均匀采样方式无法高效且充分地利用样本数据。针对上述问题,该文在DDPG的基础上,引入决斗网络来提高Q值的估计精度;优化设计奖励函数以引导移动机器人更加高效合理地运动;将单一经验池分离为双经验池,并采用动态自适应采样机制来提高经验回放的效率。最后,利用机器人操作系统和Gazebo平台搭建的仿真环境进行实验,结果表明,所提算法与DDPG算法相比,训练时间缩短了17.8%,收敛速度提高了57.46%,成功率提高了3%;与其他算法相比,该文所提算法提高了模型训练过程的稳定性,大大提升了移动机器人路径规划的效率和成功率。

关键词: 路径规划, 深度确定性策略梯度, 决斗网络, 经验池分离, 动态自适应采样

Abstract: The deep deterministic policy gradient (DDPG) algorithm utilizes an actorcritic framework to ensure smooth motion of mobile robots. However, the critic network tends to fail to distinguish effectively between different states and actions, leading to inaccurate Q-value estimates. Additionally, the sparse reward function in DDPG slows down convergence during model training, while the random uniform sampling approach utilizes the sample data inefficiently. To address these challenges, this paper introduces dueling networks to improve Q-value estimation accuracy within DDPG framework. The reward function is optimized to guide the mobile robot toward more efficient and effective movement. Furthermore, the single experience replay buffer is split into two parts, and a dynamic adaptive sampling mechanism is adopted to enhance replay efficiency. Finally, the proposed algorithm is evaluated in a simulation environment built with the robot operating system (ROS) system and Gazebo platform. Experimental results demonstrate that compared to the standard DDPG algorithm, the proposed approach reduces training time by 17.8%, improves convergence speed by 57.46%, and increases the success rate by 3%. Moreover, the proposed method outperforms other algorithms in terms of stability during model training, significantly improving the efficiency and success rate of mobile robot path planning.

Key words: path planning, deep deterministic policy gradient (DDPG), dueling network, experience pool separation, dynamic adaptive sampling

中图分类号: