一种基于改进深度确定性策略梯度的移动机器人路径规划算法

张庆玲, 倪翠, 王朋, 巩慧

doi:10.3969/j.issn.0255-8297.2025.03.005

应用科学学报 >

2025 , Vol. 43 >Issue 3: 415 - 436

DOI: https://doi.org/10.3969/j.issn.0255-8297.2025.03.005

计算机科学与应用

一种基于改进深度确定性策略梯度的移动机器人路径规划算法

展开

1. 山东交通学院信息科学与电气工程学院, 山东济南 250357;
2. 山东省科学院自动化研究所, 山东济南 250013

收稿日期: 2023-08-31

网络出版日期: 2025-06-23

基金资助

中国博士后科学基金（No.2021M702030）；山东省交通运输厅科技计划项目基金（No.2021B120）

收起

A Path Planning Algorithm for Mobile Robots Based on an Improved Deep Deterministic Policy Gradient

Expand

1. School of Information Science and Electric Engineering, Shandong Jiaotong University, Jinan 250357, Shandong, China;
2. Institute of Automation, Shandong Academy of Sciences, Jinan 250013, Shandong, China

Received date: 2023-08-31

Online published: 2025-06-23

Fold

摘要

深度确定性策略梯度（deep deterministic policy gradient,DDPG）算法采用ActorCritic框架结构，保证移动机器人运动的连续性。但Critic网络在计算值函数（Q值）时，没有充分考虑各种状态和动作的差异，导致Q值估计不准确；其次，DDPG奖励函数设置过于稀疏，容易导致模型训练时收敛慢；另外，随机均匀采样方式无法高效且充分地利用样本数据。针对上述问题，该文在DDPG的基础上，引入决斗网络来提高Q值的估计精度；优化设计奖励函数以引导移动机器人更加高效合理地运动；将单一经验池分离为双经验池，并采用动态自适应采样机制来提高经验回放的效率。最后，利用机器人操作系统和Gazebo平台搭建的仿真环境进行实验，结果表明，所提算法与DDPG算法相比，训练时间缩短了17.8%，收敛速度提高了57.46%，成功率提高了3%；与其他算法相比，该文所提算法提高了模型训练过程的稳定性，大大提升了移动机器人路径规划的效率和成功率。

关键词： 路径规划; 深度确定性策略梯度; 决斗网络; 经验池分离; 动态自适应采样

本文引用格式

张庆玲, 倪翠, 王朋, 巩慧 . 一种基于改进深度确定性策略梯度的移动机器人路径规划算法[J]. 应用科学学报, 2025 , 43(3) : 415 -436 . DOI: 10.3969/j.issn.0255-8297.2025.03.005

Abstract

The deep deterministic policy gradient (DDPG) algorithm utilizes an actorcritic framework to ensure smooth motion of mobile robots. However, the critic network tends to fail to distinguish effectively between different states and actions, leading to inaccurate Q-value estimates. Additionally, the sparse reward function in DDPG slows down convergence during model training, while the random uniform sampling approach utilizes the sample data inefficiently. To address these challenges, this paper introduces dueling networks to improve Q-value estimation accuracy within DDPG framework. The reward function is optimized to guide the mobile robot toward more efficient and effective movement. Furthermore, the single experience replay buffer is split into two parts, and a dynamic adaptive sampling mechanism is adopted to enhance replay efficiency. Finally, the proposed algorithm is evaluated in a simulation environment built with the robot operating system (ROS) system and Gazebo platform. Experimental results demonstrate that compared to the standard DDPG algorithm, the proposed approach reduces training time by 17.8%, improves convergence speed by 57.46%, and increases the success rate by 3%. Moreover, the proposed method outperforms other algorithms in terms of stability during model training, significantly improving the efficiency and success rate of mobile robot path planning.

Key words： path planning; deep deterministic policy gradient (DDPG); dueling network; experience pool separation; dynamic adaptive sampling

参考文献

[1] 鲁毅, 高永平, 龙江腾. A^*算法在移动机器人路径规划中的研究[J]. 湖北师范大学学报(自然科学版), 2022, 42(2): 59-65. Lu Y, Gao Y T, Long J T. Research on A^*algorithm in path planning of mobile robots [J]. Journal of Hubei Normal University (Natural Science Edition), 2022, 42(2): 59-65. (in Chinese)
[2] Cui J, Wu L, Huang X, et al. Multi-strategy adaptable ant colony optimization algorithm and its application in robot path planning [J]. Knowledge-Based Systems, 2024, 288: 111459.
[3] Zhou X, Yan J, Yan M, et al. Path planning of rail-mounted logistics robots based on the improved Dijkstra algorithm [J]. Applied Sciences, 2023, 13(17): 9955.
[4] Duhe J, Victor S, Melchior P. Contributions on artificial potential field method for effective obstacle avoidance [J]. Fractional Calculus and Applied Analysis, 2021, 24(2): 421-446.
[5] Han S, Xiao L. An improved adaptive genetic algorithm [C]//2022 International Conference on Information Technology in Education and Management Engineering (ITEME2022), 2022, 140: 01044.
[6] Li Y, Zhao J, Chen Z, et al. A robot path planning method based on improved genetic algorithm and improved dynamic window approach [J]. Sustainability, 2023, 15(5): 4656.
[7] Ab W M N, Nazir A. Improved genetic algorithm for mobile robot path planning in static environments [J]. Expert Systems with Applications, 2024, 249: 123762.
[8] Zhao Z, Shang H, Liu C, et al. Mesh-based two-step convex optimization for spacecraft landing trajectory planning on irregular asteroid [J]. Journal of Spacecraft and Rockets, 2024, 61(1): 72-87.
[9] Yan J, Li J. Multi-agent motion planning with Bézier curve optimization under Kinodynamic constraints [J]. IEEE Robotics and Automation Letters, 2024, 9(3): 3021-3028.
[10] 周畅, 于特, 刘佳鹏, 等. 基于快速随机搜索树* 与凸优化的船舶路径规划与跟踪算法[J]. 中国舰船研究, 2024, 1-16. Zhou C, Yu T, Liu J P, et al. Ship path planning and tracking based on rapidly exploring random tree star and convex optimization [J]. Chinese Ship Research, 2024, 1-16. (in Chinese)
[11] Wu J, Cheng L, Chu S, et al. An autonomous coverage path planning algorithm for maritime search and rescue of persons-in-water based on deep reinforcement learning [J]. Ocean Engineering, 2024, 291: 116403.
[12] 方城亮, 杨飞生, 潘泉. 基于MASAC强化学习算法的多无人机协同路径规划[J]. 中国科学: 信息科学, 2024, 54(8): 1871-1883. Fang C L, Yang F S, Pan Q. Multi-UAV collaborative path planning based on MASAC reinforcement learning algorithm [J]. Science in China: Information Science, 2024, 54(8): 1871- 1883. (in Chinese)
[13] Cai J, Du A, Liang X, et al. Prediction-based path planning for safe and efficient human-robot collaboration in construction via deep reinforcement learning [J]. Journal of Computing in Civil Engineering, 2023, 37(1): 04022046.
[14] Sahu B, Das P K, Ranjan-Kabat M. Multi-robot cooperation and path planning for stick transporting using improved Q-learning and democratic robotics PSO [J]. Journal of Computational Science, 2022, 60: 101637.
[15] Puente-Castro A, Rivero D, Pedrosa E, et al. Q-learning based system for path planning with unmanned aerial vehicles swarms in obstacle environments [J]. Expert Systems with Applications, 2024, 235: 121240.
[16] Chen L, Wang Y, Miao Z, et al. Transformer-based imitative reinforcement learning for multirobot path planning [J]. IEEE Transactions on Industrial Informatics, 2023, 19(10): 10233- 10243.
[17] Zhang H, Wang W, Zhang S, et al. A novel method based on deep reinforcement learning for machining process route planning [J]. Robotics and Computer-Integrated Manufacturing, 2024, 86: 102688.
[18] Li J, Chen Y, Zhao X, et al. An improved DQN path planning algorithm [J]. The Journal of Supercomputing, 2022, 78(1): 616-639.
[19] Zhou Q, Lian Y, Wu J, et al. An optimized Q-learning algorithm for mobile robot local path planning [J]. Knowledge-Based Systems, 2024, 286: 111400.
[20] Dong Y, Zou X. Mobile robot path planning based on improved DDPG reinforcement learning algorithm [C]// IEEE 11th International Conference on Software Engineering and Service Science, 2020: 52-56.
[21] Du Y, Zhang X, Cao Z, et al. An optimized path planning method for coastal ships based on improved DDPG and DP [J]. Journal of Advanced Transportation, 2021, 2021: 1-23.
[22] Tai L, Paolo G, Liu M. Virtual-to-real deep reinforcement learning: continuous control of mobile robots for mapless navigation [C]//IEEE/RSJ International Conference on Intelligent Robots and Systems, 2017: 31-36.
[23] Liu Y, Zhang W, Chen F, et al. Path planning based on improved deep deterministic policy gradient algorithm [C]//IEEE 3rd Information Technology, Networking, Electronic and Automation Control Conference, 2019: 295-299.
[24] Zhang M, Zhang Y, Gao Z, et al. An improved DDPG and its application based on the double-layer BP neural network [J]. IEEE Access, 2020, 8: 177734-177744.
[25] Gong H, Wang P, Ni C, et al. Efficient path planning for mobile robot based on deep deterministic policy gradient [J]. Sensors, 2022, 22(9): 3579.
[26] Li B, Yang Z, Chen D, et al. Maneuvering target tracking of UAV based on MN-DDPG and transfer learning [J]. Defence Technology, 2021, 17(2): 457-466.
[27] Zhao Y, Wang X, Wang R, et al. Path planning for mobile robots based on TPR-DDPG [C]// International Joint Conference on Neural Networks, 2021: 1-8.
[28] Wu R, Gu F, Liu H, et al. UAV path planning based on multicritic-delayed deep deterministic policy gradient [J]. Wireless Communications and Mobile Computing, 2022: 1-12.
[29] Rahul M, Chiddarwar S. Deep reinforcement learning with inverse Jacobian based modelfree path planning for deburring in complex industrial environment [J]. Journal of Intelligent & Robotic Systems, 2024, 110(1): 4.
[30] Li P, Ding X, Sun H, et al. Research on dynamic path planning of mobile robot based on improved DDPG algorithm [J]. Mobile Information Systems, 2021: 1-10.
[31] Hao B, Du H, Yan Z. A path planning approach for unmanned surface vehicles based on dynamic and fast Q-learning [J]. Ocean Engineering, 2023, 270: 113632.
[32] Wu M, Gao Y, Jung A, et al. The actor-dueling-critic method for reinforcement learning [J]. Sensors, 2019, 19(7): 1547.
[33] Wang Z, Schaul T, Hessel M, et al. Dueling network architectures for deep reinforcement learning [C]//International Conference on Machine Learning, 2016: 1995-2003.
[34] Gu Y, Zhu Z, Lyu J, et al. DM-DQN: dueling Munchausen deep Q network for robot path planning [J]. Complex & Intelligent Systems, 2023, 9(4): 4287-4300.

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献