基于Q学习的星地融合协作传输中继选择策略

汪萧萧; 孔槐聪; 朱卫平; 林敏

doi:10.3969/j.issn.0255-8297.2021.02.007

应用科学学报 >

2021 , Vol. 39 >Issue 2: 250 - 260

DOI: https://doi.org/10.3969/j.issn.0255-8297.2021.02.007

通信工程

基于Q学习的星地融合协作传输中继选择策略

汪萧萧 ,
孔槐聪 ,
朱卫平 ,
林敏

展开

1. 南京邮电大学通信与信息工程学院, 江苏南京 210003;
2. 南京邮电大学宽带无线通信与传感网技术教育部重点实验室, 江苏南京 210003

收稿日期: 2019-11-28

网络出版日期: 2021-04-01

基金资助

国家自然科学基金（No.61801234）；江苏省自然科学基金（No.BK20160911）；江苏省研究生科研与实践创新计划项目（No.KYCX19_0950）；南京邮电大学宽带无线通信与传感网技术教育部重点实验室开放研究基金（No.JZNY201701）资助

收起

Q-learning Based Relay Selection Strategy for Hybrid Satellite-Terrestrial Cooperative Transmission

WANG Xiaoxiao ,
KONG Huaicong ,
ZHU Weiping ,
LIN Min

Expand

1. College of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, Jiangsu, China;
2. Key Laboratory of Broadband Wireless Communication and Sensor Network Technology, Ministry of Education, Nanjing University of Posts and Telecommunications, Nanjing 210003, Jiangsu, China

Received date: 2019-11-28

Online published: 2021-04-01

Fold

摘要

协作网络中的中继技术能够实现空间分集，但中继选择会对系统性能产生较大影响。针对这一问题，本文提出了一种基于Q学习的星地融合协作传输中继选择策略。首先，所有中继节点在经过放大转发协议的情况下，在接收端得到最大比合并后的输出信噪比表达式。然后，设定Q学习的状态、动作和奖励函数，选择累积回报最大的中继节点。接着，为了遍历所有状态，引入了Boltzmann选择策略，用概率的途径来选择动作，使源节点探索所有状态并利用最优状态。最后，在所选中继节点与源节点之间进行功率分配得到最优传输功率。仿真结果表明：与随机中继选择算法相比，所提出的Q学习中继选择策略对系统性能有较大地提升。

关键词： 星地融合协作网络; 中继选择; Q学习; Boltzmann选择策略; 功率分配

本文引用格式

汪萧萧 , 孔槐聪 , 朱卫平 , 林敏 . 基于Q学习的星地融合协作传输中继选择策略[J]. 应用科学学报, 2021 , 39(2) : 250 -260 . DOI: 10.3969/j.issn.0255-8297.2021.02.007

Abstract

Cooperative relay networks can achieve spatial diversity, but their system performances heavily depends on relay selection schemes. To solve this problem, a hybrid satellite-terrestrial cooperative network relay selection strategy based on Q-learning is proposed. First, under the consideration that all the relay nodes employ amplify-and-forward protocol, the end-to-end output signal-to-noise ratio after combining the maximal ratio is derived. Next, the state, action and reward function of Q-learning are set to select the relay node with the greatest cumulative return. Then, in order to traverse all states, Boltzmann selection policy is induced to select action by probability approach, so that the source node can explore all states and find the optimal one. Finally, the optimal transmission power is obtained by using power allocation scheme between the selected relay node and the source node. Simulation results show that, compared with the random relay selection algorithm, the proposed strategy greatly improves the system performance.

Key words： hybrid satellite-terrestrial cooperative network; relay selection; Q-learning; Boltzmann selection policy; power allocation

参考文献

[1] 刘桢, 张嘉怡, 陆明泉, 等. 一种通用的卫星导航信号码时延估计误差评估方法[J]. 物理学报, 2017, 66(12):355-366. Liu Z, Zhang J Y, Lu M Q, et al. A universal error estimation method for satellite navigation signal code delay estimation[J]. Journal of Physics, 2017, 66(12):355-366. (in Chinese)
[2] Lin Z, Lin M, Wang J B. Robust secure beamforming for 5G cellular networks coexisting with satellite networks[J]. IEEE Journal on Selected Areas in Communications, 2018, 36(4):932-945.
[3] 郝谢东, 刘爱军, 张邦宁. 多波束GEO卫星通信中CDMA与FDMA的多址容量[J]. 应用科学学报, 2009, 27(1):24-28. Hao X D, Liu A J, Zhang B N. Capacity of CDMA and FDMA for multi-beam GEO satellite communications[J]. Journal of Applied Sciences, 2009, 27(1):24-28. (in Chinese)
[4] Lin Z, Lin M, Ouyang J. Beamforming for secure wireless information and power transfer in terrestrial networks coexisting with satellite networks[J]. IEEE Signal Processing Letters, 2018, 25(8):1166-1170.
[5] 汪春霆, 李宁, 翟立君, 等. 卫星通信与地面5G的融合初探(一)[J]. 卫星与网络, 2018(9):14-21. Wang C T, Li N, Zhai L J, et al. Preliminary study on the integration of satellite communication and ground 5G (I)[J]. Satellite & Network, 2018(9):14-21. (in Chinese)
[6] An K, Lin M, Liang T. Performance analysis of multi-antenna hybrid satellite-terrestrial relay networks in the presence of interference[J]. IEEE Transactions on Communications, 2015, 63(11):4390-4404.
[7] Muh A, Sung K. Relay selection algorithm for wireless cooperative networks:a learning-based approach[J]. IET Communications, 2017, 11(7):1061-1066.
[8] Bhatnagar M R, Arti M K. Performance analysis of AF based hybrid satellite-terrestrial cooperative network over generalized fading channels[J]. IEEE Communications Letters, 2013, 17(10):1912-1915.
[9] Bhatnagar M R, Arti M K. Performance analysis of hybrid satellite-terrestrial FSO cooperative system[J]. IEEE Photonics Technology Letters, 2013, 25(22):2197-2200.
[10] Labrador Y, Karimi M, Pan D, et al. An approach to cooperative satellite communications in 4G mobile systems[J]. Journal of Communications, 2009, 4(10):815-826.
[11] Lin Z, Lin M, Wang J B, et al. Joint beamforming and power allocation for satellite-terrestrial integrated networks with non-orthogonal multiple access[J]. IEEE Journal on Selected Topics Signal Process, 2019, 13(3):657-670.
[12] Sreng S, Escrig B, Boucheret M. Exact outage probability of a hybrid satellite terrestrial cooperative system with best relay selection[C]//International Conference on Communications (ICC). Budapest:IEEE, 2013:4520-4524.
[13] Awoyemi B, Walingo T, Takawira F. Predictive relay-selection cooperative diversity in land mobile satellite systems[J]. International Journal of Satellite Communications and Networking, 2016, 34(2):277-294.
[14] Awoyemi B, Walingo T, Takawira F. Relay selection cooperative diversity in land mobile satellite systems[C]//IEEE Africon. Mauritius:IEEE, 2013:1-6.
[15] 穆文静, 李勇朝, 阮玉晗, 等. 基于遍历容量的低轨卫星协作通信中继选择算法[J]. 信号处理, 2017, 33(10):1317-1323. Mu W J, Li Y C, Ruan Y H, et al. Ergodic capacity based relay selection algorithm in LEO satellite cooperative communication[J]. Journal of Signal Processing, 2017, 33(10):1317-1323. (in Chinese)
[16] 章品正, 王健弘. 一种应用机器学习的车牌定位方法[J]. 应用科学学报, 2011, 29(2):147-152. Zhang P Z, Wang J H. Vehicle license plate location based on machine learning[J]. Journal of Applied Sciences, 2011, 29(2):147-152. (in Chinese)
[17] Yao Y J, Feng Z Y. Centralized channel and power allocation for cognitive radio networks:a Q-learning solution[C]//Future Network & Mobile Summit. IEEE, 2010:1-8.
[18] Latifa B, Gao Z G, Liu S. Distributed multi-agent Q-learning for joint channel allocation and power control in cognitive radio networks[J]. Journal of Computational Information Systems, 2012, 8(17):7071-7078.
[19] 伍春, 江虹, 易克初. 聚类多Agent强化学习认知无线电资源分配[J]. 北京邮电大学学报, 2014, 37(1):80-84. Wu C, Jiang H, Yi K C. Cognitive radio resource allocation by clustering multi-agent enforcement learning[J]. Journal of Beijing University of Posts and Telecommunications, 2014, 37(1):80-84. (in Chinese)
[20] Jung H, Kim K, Kim J, et al. A relay selection scheme using Q-learning algorithm in cooperative wireless communications[C]//18th Asia-Pacific Conference on Communications. IEEE, 2012:7-11.
[21] Gao M, Liu Y, Malec J. A new Q-learning algorithm based on the metropolis criterion[J]. IEEE Transactions on Systems Man, and Cybernetics, Part B:Cybernetics, 2004, 34(5):2140-2143.
[22] 欧阳键, 庄毅, 薛羽, 等. 非对称衰落信道下无人机中继传输方案及性能分析[J]. 航空学报, 2013(1):130-140. Ouyang J, Zhuang Y, Xue Y, et al. UAV relay transmission scheme and its performance analysis over asymmetric fading channels[J]. Acta Aeronautica ET Astronautica Sinica, 2013(1):130-140. (in Chinese)

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献