基于元学习和强化学习的自动驾驶算法

doi:10.3969/j.issn.0255-8297.2024.05.007

应用科学学报 ›› 2024, Vol. 42 ›› Issue (5): 795-809.doi: 10.3969/j.issn.0255-8297.2024.05.007

• 计算机科学与应用 • 上一篇

基于元学习和强化学习的自动驾驶算法

金彦亮^1,2, 范宝荣^1,2, 高塬^1,2, 汪小勇^3,4, 顾晨杰^1,2

1. 上海大学通信与信息工程学院, 上海 200444;
2. 上海大学上海先进通信与数据科学研究院, 上海 200444;
3. 卡斯柯信号有限公司, 上海 200070;
4. 上海轨道交通无人驾驶列控系统工程技术研究中心, 上海 200434

收稿日期:2023-03-05 发布日期:2024-09-29
通信作者: 金彦亮，副教授，博导，研究方向为自动驾驶、人工智能。E-mail:jinyanliang@staff.shu.edu.cn E-mail:jinyanliang@staff.shu.edu.cn
基金资助:
国家自然科学基金（No.22ZR1422200）资助

Autonomous Driving Algorithm Based on Meta-Learning and Reinforcement Learning

JIN Yanliang^1,2, FAN Baorong^1,2, GAO Yuan^1,2, WANG Xiaoyong^3,4, GU Chenjie^1,2

1. School of Communication and Information Engineering, Shanghai University, Shanghai 200444, China;
2. Shanghai Institute for Advanced Communication and Data Science, Shanghai University, Shanghai 200444, China;
3. CASCO Signal Co., Ltd., Shanghai 200070, China;
4. Shanghai Rail Transit Unmanned Train Control System Engineering and Technology Research Center, Shanghai 200434, China

Received:2023-03-05 Published:2024-09-29

摘要/Abstract

摘要： 针对基于强化学习的自动驾驶算法存在收敛困难、训练效果不理想、泛化性能差等问题，提出了一种基于元学习和强化学习的自动驾驶系统。该系统首先将变分自编码器（variational auto-encoder,VAE）与具有梯度惩罚的Wasserstein生成对抗网络（Wasserstein generative adversarial network with gradient penalty,WGAN-GP）相结合形成VWG (VAE-WGAN-GP)模型，提高了所提取特征的质量；然后用元学习算法Reptile训练VWG特征提取模型，进一步得到MVWG（meta-VWG）特征提取模型，以提高模型的训练速度；最后将特征提取模型与近端策略优化（proximal policy optimization,PPO）决策算法相结合，对PPO算法中的奖励函数进行改进，提高了决策模型的收敛速度，最终得到MVWG-PPO自动驾驶模型。实验结果表明，该文提出的MVWG特征提取模型与VAE、VW（VAE-WGAN）、VWG基准模型相比，重构损失分别降低了60.82%、44.73%和29.09%，收敛速度均提高约5.00倍，重构图像更加清晰，并且在自动驾驶任务中的表现也更好，能够为智能车提供更高质量的特征信息。同时，改进奖励函数后的决策模型与基准决策模型相比，收敛速度也提高了11.33%，充分证明了该文方法的先进性。

关键词: 自动驾驶, 特征提取, 强化学习, 元学习

Abstract: To address the problems of convergence difficulty, unsatisfactory training effect and poor generalization performance of autonomous driving algorithms based on reinforcement learning, an autonomous driving system based on meta-learning and reinforcement learning is proposed in this paper. The system first combines variational auto encoder (VAE) with Wasserstein generative adversarial network incorporating gradient penalty (WGAN-GP) to form the VWG (VAE-WGAN-GP) model, which improves the quality of extracted feature. Then, the meta learning algorithm Reptile is used to train the VWG feature extraction model, yielding the MVWG (Meta-VWG) feature extraction model. This approach accelerates the training speed. Finally, the feature extraction model is combined with the proximal policy optimization (PPO) decision algorithm, and the reward function in the PPO algorithm is refined to enhance the convergence speed of the decision model, resulting in the MVWG-PPO autonomous driving model. Experimental results show that compared with VAE, VW (VAE-WGAN) and VWG benchmark models, the MVWG feature extraction model proposed in this paper reduces reconstruction loss by 60.82%, 44.73%, and 29.09%, respectively. The convergence rate increases approximately fivefold, achieving clearer reconstructed images and superior performance in automatic driving tasks. It can provide higher-quality feature information for autonomous vehicles. Meanwhile, compared with the benchmark decision model, the improved reward function model exhibits an 11.33% increase in convergence rate, which fully demonstrating the superiority of the proposed method.

Key words: autonomous driving, feature extraction, reinforcement learning, meta-learning

中图分类号:

TP181

金彦亮, 范宝荣, 高塬, 汪小勇, 顾晨杰. 基于元学习和强化学习的自动驾驶算法[J]. 应用科学学报, 2024, 42(5): 795-809.

JIN Yanliang, FAN Baorong, GAO Yuan, WANG Xiaoyong, GU Chenjie. Autonomous Driving Algorithm Based on Meta-Learning and Reinforcement Learning[J]. Journal of Applied Sciences, 2024, 42(5): 795-809.

参考文献

[1] Goodfellow I, Bengio Y, Courville A. Deep learning: adaptive computation and machine learning series [M]. Cambridge, MA: MIT Press, 2016.
[2] Nishi T, Doshi P, Prokhorov D. Merging in congested freeway traffic using multipolicy decision making and passive actor-critic learning [J]. IEEE Transactions on Intelligent Vehicles, 2019, 4(2): 287-297.
[3] Kingma D P, Welling M. Auto-encoding variational Bayes [DB/OL]. 2013[2023-03-05]. http://arxiv.org/abs/1312.6114.
[4] Pu Y C, Gan Z, Henao R, et al. Variational autoencoder for deep learning of images, labels and captions [J]. Advances in Neural Information Processing Systems, 2016, 29: 1-9.
[5] Santana E, Emigh M, Principe J C. Information theoretic-learning auto-encoder [C]//2016 International Joint Conference on Neural Networks (IJCNN), 2016: 3296-3301.
[6] Sun Y Y, Xu L L, Li Y, et al. Utilizing deep architecture networks of VAE in software fault prediction [C]//2018 IEEE International Conference on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom 2018), 2018: 870-877.
[7] Niu Z J, Yu K, Wu X F. LSTM-based VAE-GAN for time-series anomaly detection [J]. Sensors, 2020, 20(13): 3738-3750.
[8] Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial networks [DB/OL]. 2014[2023-03-05]. http://arxiv.org/abs/1406.2661.
[9] Arjovsky M, Chintala S, Bottou L. Wasserstein generative adversarial networks [C]//International Conference on Machine Learning, 2017: 214-223.
[10] Zhan L, Xu X W, Qiao X, et al. Fault feature extraction method of a permanent magnet synchronous motor based on VAE-WGAN [J]. Processes, 2022, 10(2): 200-216.
[11] Thrun S, Pratt L. Learning to learn: introduction and overview [M] Boston, MA: Springer, 1998.
[12] Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast adaptation of deep networks [C]//International Conference on Machine Learning, 2017: 1126-1135.
[13] Nichol A, Achiam J, Schulman J. On first-order meta-learning algorithms [DB/OL]. 2018[2023-03-05]. https://arxiv.org/abs/1803.029-99.
[14] Pang K, Zhang Y X, Yin C K. A decision-making method for self-driving based on deep reinforcement learning [J]. Journal of Physics: Conference Series, 2020, 1576(1): 012025-012033.
[15] Tseng K K, Yang H, Wang H Y, et al. Autonomous driving for natural paths using an improved deep reinforcement learning algorithm [J]. IEEE Transactions on Aerospace and Electronic Systems, 2022, 58(6): 5118-5128.
[16] Wu Y Q, Liao S Q, Liu X, et al. Deep reinforcement learning on autonomous driving policy with auxiliary critic network [J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(7): 3680-3690.
[17] Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning [J]. Nature, 2015, 518: 529-533.
[18] Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning [J]. Computer Science, 2015: 1-14.
[19] Schulman J, Wolski F, Dhariwal P, et al. Proximal policy optimization algorithms [DB/OL]. 2017[2023-03-05]. https://arxiv.org/abs/1707.06347.
[20] Guo Y T, Zhang Q C, Wang J J, et al. Hierarchical reinforcement learning-based policy switching towards multi-scenarios autonomous driving [C]//2021 International Joint Conference on Neural Networks (IJCNN), 2021: 1-8.
[21] Ye F, Cheng X X, Wang P, et al. Automated lane change strategy using proximal policy optimization-based deep reinforcement learning [C]//2020 IEEE Intelligent Vehicles Symposium (IV), 2020: 1746-1752.
[22] 刘明明, 张敏情, 刘佳, 等. 基于生成对抗网络的无载体信息隐藏[J]. 应用科学学报, 2018, 36(2): 371-382. Liu M M, Zhang M Q, Liu J, et al. Coverless information hiding based on generative adversarial networks [J]. Journal of Applied Sciences, 2018, 36(2): 371-382. (in Chinese)
[23] Jin Y L, Ji Z Y, Zeng D, et al. VWP: an efficient DRL-based autonomous driving model [J]. IEEE Transactions on Multimedia, 2022: 1-13.

基于元学习和强化学习的自动驾驶算法

Autonomous Driving Algorithm Based on Meta-Learning and Reinforcement Learning

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	王子驰, 李斌, 冯国瑞, 张新鹏. 数字图像隐写分析综述[J]. 应用科学学报, 2024, 42(5): 723-732.
[2]	刘星彤, 郑红, 黄建华. 一种改进近端优化的多目标流QoS调度策略[J]. 应用科学学报, 2024, 42(3): 499-512.
[3]	王鑫, 仲伟志, 王俊智, 肖丽君, 朱秋明. 基于深度强化学习的无人机路径规划与无线电测绘[J]. 应用科学学报, 2024, 42(2): 200-210.
[4]	张华南, 李石君, 金红. 无线传感器网络强化学习增强路由研究[J]. 应用科学学报, 2024, 42(1): 83-93.
[5]	陈裔鋆, 陈羽, 滕飞. 面向铁路道岔情景下的列车轨道区域检测方法[J]. 应用科学学报, 2024, 42(1): 145-160.
[6]	葛星, 秦丽, 沙瀛. 基于投影奖励机制的多机器人协同编队与避障[J]. 应用科学学报, 2024, 42(1): 174-188.
[7]	彭凯, 刘培琛, 许小龙, 周星宇. 面向智慧城市的多依赖任务计算迁移研究[J]. 应用科学学报, 2023, 41(3): 391-404.
[8]	倪翠, 王朋, 孙浩, 李倩. 一种基于四叉树划分的改进ORB算法[J]. 应用科学学报, 2022, 40(2): 266-278.
[9]	郑长亮, 庞明. 基于卷积神经网络的时空权重姿态运动特征提取算法[J]. 应用科学学报, 2021, 39(4): 594-604.
[10]	张晓龙, 王庆伟, 李尚滨. 基于强化学习的多模态场景人体危险行为识别方法[J]. 应用科学学报, 2021, 39(4): 605-614.
[11]	张彤彤, 董军宇, 赵浩然, 李琼, 孙鑫. 基于知识蒸馏的轻量型浮游植物检测网络[J]. 应用科学学报, 2020, 38(3): 367-376.
[12]	徐昶, 王聪, 刘灵雅, 李宁. 基于强化学习的M2M网络自适应媒体接入控制协议[J]. 应用科学学报, 2017, 35(3): 317-325.
[13]	刘格, 黄方军, 李中华. 针对自适应隐写的通用隐写分析研究[J]. 应用科学学报, 2016, 34(5): 598-604.
[14]	吕锐, 夏志华, 陈先意, 孙星明. 基于韦伯二值感知特征的指纹活性检测[J]. 应用科学学报, 2016, 34(5): 616-624.
[15]	李开明1，张群1,2，梁必帅1，罗迎1. 卡车目标微多普勒建模及特征提取[J]. 应用科学学报, 2014, 32(2): 170-177.