应用科学学报 ›› 2021, Vol. 39 ›› Issue (2): 250-260.doi: 10.3969/j.issn.0255-8297.2021.02.007

• 通信工程 • 上一篇    

基于Q学习的星地融合协作传输中继选择策略

汪萧萧1, 孔槐聪1, 朱卫平1,2, 林敏1,2   

  1. 1. 南京邮电大学 通信与信息工程学院, 江苏 南京 210003;
    2. 南京邮电大学 宽带无线通信与传感网技术教育部重点实验室, 江苏 南京 210003
  • 收稿日期:2019-11-28 发布日期:2021-04-01
  • 通信作者: 朱卫平,教授,研究方向为无线通信,机器学习。E-mail:zwp@njupt.edu.cn E-mail:zwp@njupt.edu.cn
  • 基金资助:
    国家自然科学基金(No.61801234);江苏省自然科学基金(No.BK20160911);江苏省研究生科研与实践创新计划项目(No.KYCX19_0950);南京邮电大学宽带无线通信与传感网技术教育部重点实验室开放研究基金(No.JZNY201701)资助

Q-learning Based Relay Selection Strategy for Hybrid Satellite-Terrestrial Cooperative Transmission

WANG Xiaoxiao1, KONG Huaicong1, ZHU Weiping1,2, LIN Min1,2   

  1. 1. College of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, Jiangsu, China;
    2. Key Laboratory of Broadband Wireless Communication and Sensor Network Technology, Ministry of Education, Nanjing University of Posts and Telecommunications, Nanjing 210003, Jiangsu, China
  • Received:2019-11-28 Published:2021-04-01

摘要: 协作网络中的中继技术能够实现空间分集,但中继选择会对系统性能产生较大影响。针对这一问题,本文提出了一种基于Q学习的星地融合协作传输中继选择策略。首先,所有中继节点在经过放大转发协议的情况下,在接收端得到最大比合并后的输出信噪比表达式。然后,设定Q学习的状态、动作和奖励函数,选择累积回报最大的中继节点。接着,为了遍历所有状态,引入了Boltzmann选择策略,用概率的途径来选择动作,使源节点探索所有状态并利用最优状态。最后,在所选中继节点与源节点之间进行功率分配得到最优传输功率。仿真结果表明:与随机中继选择算法相比,所提出的Q学习中继选择策略对系统性能有较大地提升。

关键词: 星地融合协作网络, 中继选择, Q学习, Boltzmann选择策略, 功率分配

Abstract: Cooperative relay networks can achieve spatial diversity, but their system performances heavily depends on relay selection schemes. To solve this problem, a hybrid satellite-terrestrial cooperative network relay selection strategy based on Q-learning is proposed. First, under the consideration that all the relay nodes employ amplify-and-forward protocol, the end-to-end output signal-to-noise ratio after combining the maximal ratio is derived. Next, the state, action and reward function of Q-learning are set to select the relay node with the greatest cumulative return. Then, in order to traverse all states, Boltzmann selection policy is induced to select action by probability approach, so that the source node can explore all states and find the optimal one. Finally, the optimal transmission power is obtained by using power allocation scheme between the selected relay node and the source node. Simulation results show that, compared with the random relay selection algorithm, the proposed strategy greatly improves the system performance.

Key words: hybrid satellite-terrestrial cooperative network, relay selection, Q-learning, Boltzmann selection policy, power allocation

中图分类号: