To address the problems of convergence difficulty, unsatisfactory training effect and poor generalization performance of autonomous driving algorithms based on reinforcement learning, an autonomous driving system based on meta-learning and reinforcement learning is proposed in this paper. The system first combines variational auto encoder (VAE) with Wasserstein generative adversarial network incorporating gradient penalty (WGAN-GP) to form the VWG (VAE-WGAN-GP) model, which improves the quality of extracted feature. Then, the meta learning algorithm Reptile is used to train the VWG feature extraction model, yielding the MVWG (Meta-VWG) feature extraction model. This approach accelerates the training speed. Finally, the feature extraction model is combined with the proximal policy optimization (PPO) decision algorithm, and the reward function in the PPO algorithm is refined to enhance the convergence speed of the decision model, resulting in the MVWG-PPO autonomous driving model. Experimental results show that compared with VAE, VW (VAE-WGAN) and VWG benchmark models, the MVWG feature extraction model proposed in this paper reduces reconstruction loss by 60.82%, 44.73%, and 29.09%, respectively. The convergence rate increases approximately fivefold, achieving clearer reconstructed images and superior performance in automatic driving tasks. It can provide higher-quality feature information for autonomous vehicles. Meanwhile, compared with the benchmark decision model, the improved reward function model exhibits an 11.33% increase in convergence rate, which fully demonstrating the superiority of the proposed method.
JIN Yanliang, FAN Baorong, GAO Yuan, WANG Xiaoyong, GU Chenjie
. Autonomous Driving Algorithm Based on Meta-Learning and Reinforcement Learning[J]. Journal of Applied Sciences, 2024
, 42(5)
: 795
-809
.
DOI: 10.3969/j.issn.0255-8297.2024.05.007
[1] Goodfellow I, Bengio Y, Courville A. Deep learning: adaptive computation and machine learning series [M]. Cambridge, MA: MIT Press, 2016.
[2] Nishi T, Doshi P, Prokhorov D. Merging in congested freeway traffic using multipolicy decision making and passive actor-critic learning [J]. IEEE Transactions on Intelligent Vehicles, 2019, 4(2): 287-297.
[3] Kingma D P, Welling M. Auto-encoding variational Bayes [DB/OL]. 2013[2023-03-05]. http://arxiv.org/abs/1312.6114.
[4] Pu Y C, Gan Z, Henao R, et al. Variational autoencoder for deep learning of images, labels and captions [J]. Advances in Neural Information Processing Systems, 2016, 29: 1-9.
[5] Santana E, Emigh M, Principe J C. Information theoretic-learning auto-encoder [C]//2016 International Joint Conference on Neural Networks (IJCNN), 2016: 3296-3301.
[6] Sun Y Y, Xu L L, Li Y, et al. Utilizing deep architecture networks of VAE in software fault prediction [C]//2018 IEEE International Conference on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom 2018), 2018: 870-877.
[7] Niu Z J, Yu K, Wu X F. LSTM-based VAE-GAN for time-series anomaly detection [J]. Sensors, 2020, 20(13): 3738-3750.
[8] Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial networks [DB/OL]. 2014[2023-03-05]. http://arxiv.org/abs/1406.2661.
[9] Arjovsky M, Chintala S, Bottou L. Wasserstein generative adversarial networks [C]//International Conference on Machine Learning, 2017: 214-223.
[10] Zhan L, Xu X W, Qiao X, et al. Fault feature extraction method of a permanent magnet synchronous motor based on VAE-WGAN [J]. Processes, 2022, 10(2): 200-216.
[11] Thrun S, Pratt L. Learning to learn: introduction and overview [M] Boston, MA: Springer, 1998.
[12] Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast adaptation of deep networks [C]//International Conference on Machine Learning, 2017: 1126-1135.
[13] Nichol A, Achiam J, Schulman J. On first-order meta-learning algorithms [DB/OL]. 2018[2023-03-05]. https://arxiv.org/abs/1803.029-99.
[14] Pang K, Zhang Y X, Yin C K. A decision-making method for self-driving based on deep reinforcement learning [J]. Journal of Physics: Conference Series, 2020, 1576(1): 012025-012033.
[15] Tseng K K, Yang H, Wang H Y, et al. Autonomous driving for natural paths using an improved deep reinforcement learning algorithm [J]. IEEE Transactions on Aerospace and Electronic Systems, 2022, 58(6): 5118-5128.
[16] Wu Y Q, Liao S Q, Liu X, et al. Deep reinforcement learning on autonomous driving policy with auxiliary critic network [J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(7): 3680-3690.
[17] Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning [J]. Nature, 2015, 518: 529-533.
[18] Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning [J]. Computer Science, 2015: 1-14.
[19] Schulman J, Wolski F, Dhariwal P, et al. Proximal policy optimization algorithms [DB/OL]. 2017[2023-03-05]. https://arxiv.org/abs/1707.06347.
[20] Guo Y T, Zhang Q C, Wang J J, et al. Hierarchical reinforcement learning-based policy switching towards multi-scenarios autonomous driving [C]//2021 International Joint Conference on Neural Networks (IJCNN), 2021: 1-8.
[21] Ye F, Cheng X X, Wang P, et al. Automated lane change strategy using proximal policy optimization-based deep reinforcement learning [C]//2020 IEEE Intelligent Vehicles Symposium (IV), 2020: 1746-1752.
[22] 刘明明, 张敏情, 刘佳, 等. 基于生成对抗网络的无载体信息隐藏[J]. 应用科学学报, 2018, 36(2): 371-382. Liu M M, Zhang M Q, Liu J, et al. Coverless information hiding based on generative adversarial networks [J]. Journal of Applied Sciences, 2018, 36(2): 371-382. (in Chinese)
[23] Jin Y L, Ji Z Y, Zeng D, et al. VWP: an efficient DRL-based autonomous driving model [J]. IEEE Transactions on Multimedia, 2022: 1-13.