应用科学学报 ›› 2025, Vol. 43 ›› Issue (2): 208-221.doi: 10.3969/j.issn.0255-8297.2025.02.002

• 通信工程 • 上一篇    

无人机辅助NOMA通信系统中的3D轨迹优化与资源分配

朱耀辉, 王涛, 彭振春, 刘含   

  1. 上海大学 通信与信息工程学院, 上海 200444
  • 收稿日期:2023-05-31 发布日期:2025-04-03
  • 通信作者: 王涛,教授,博导,研究方向为高能效无线通信或信号处理系统的优化设计等。E-mail:twang@shu.edu.cn
  • 基金资助:
    国家自然科学基金(No.61671011)资助

3D Trajectory Optimization and Resource Allocation in UAV-Assisted NOMA Communication Systems

ZHU Yaohui, WANG Tao, PENG Zhenchun, LIU Han   

  1. School of Communication and Information Engineering, Shanghai University, Shanghai 200444, China
  • Received:2023-05-31 Published:2025-04-03

摘要: 无人机辅助通信系统是未来无线通信系统的重要组成部分。为进一步提高无人机辅助通信系统中时频资源的利用率,本文研究了一种基于非正交多址技术的无人机辅助通信架构,并提出了一种基于双延迟深度确定性策略梯度的TD3-TOPATM(twin delayedtrajectory optimization and power allocation for total throughput maximization)算法,以最大化总吞吐量为目标,在满足最大功率约束、空间约束、最大飞行速度和服务质量(qualityof service,QoS)约束的情况下,联合优化无人机的功率分配策略和3D轨迹。仿真实验分析结果表明,与随机算法相比,TD3-TOPATM算法能够实现98%的性能增益;与基于DQN(deep Q-network)的轨迹优化与资源分配算法相比,TD3-TOPATM算法获得的性能增益为19.4%;与基于深度确定性策略梯度的轨迹优化与资源分配算法相比,TD3-TOPATM算法得到的总吞吐量增加了9.7%;与基于正交多址技术的无人机辅助通信方案相比,基于非正交多址技术的无人机辅助通信方案实现了55%的性能增益。

关键词: 深度强化学习, 无人机辅助通信, 3D轨迹优化, 非正交多址, 双延迟深度确定性策略梯度

Abstract: UAV-assisted communication system is an important component of future wireless networks. In order to further improve the utilization of time-frequency resources in UAV-assisted communication systems, this paper proposes a communication architecture based on non-orthogonal multiple access (NOMA) technology and introduces a TD3-TOPATM (twin delayed-trajectory optimization and power allocation for total throughput maximization) algorithm based on the double-delay deep deterministic policy gradient strategy. The TD3-TOPATM algorithm jointly optimizes the 3D trajectory and power allocation strategy of the UAV, with the aim of maximizing the total throughput while satisfying constraints on maximum power, spatial boundaries maximum flight speed, and quality of service (QoS). Simulation results show that compared with the trajectory optimization algorithm with random optimization, the TD3-TOPATM algorithm achieves a performance gain of 98%. Additionally, it outperforms the deep Q-network (DQN)-based trajectory optimization and resource allocation algorithm, increasing total throughput by 19.4%, and surpasses the deep deterministic policy gradient (DDPG)-based algorithm with a 9.7% throughput gain. Furthermore, the NOMA-based UAV-assisted communication scheme achieves a 55% performance gain compared to the OMA-based scheme.

Key words: deep reinforcement learning, UAV-assisted communication, 3D trajectory optimization, non-orthogonal multiple access, double-delay deep deterministic policy gradient

中图分类号: