应用科学学报 ›› 2021, Vol. 39 ›› Issue (4): 532-544.doi: 10.3969/j.issn.0255-8297.2021.04.002

• CCF NCCA 2020专辑 • 上一篇    

基于生成对抗网络的异质信息网络表征学习

刘星宏1, 王英1,2,3, 王鑫3,4, 兰书梅1,2,3   

  1. 1. 吉林大学 计算机科学与技术学院, 吉林 长春 130012;
    2. 吉林大学 软件学院, 吉林 长春 130012;
    3. 吉林大学 符号计算与知识工程教育部重点实验室, 吉林 长春 130012;
    4. 长春工程学院 计算机技术与工程学院, 吉林 长春 130012
  • 收稿日期:2020-08-26 发布日期:2021-08-04
  • 通信作者: 王英,教授,研究方向为机器学习、数据挖掘。E-mail:wangying2010@jlu.edu.cn E-mail:wangying2010@jlu.edu.cn
  • 基金资助:
    国家自然科学基金(No.61872161,No.61976103);吉林省科技发展计划项目基金(No.2018101328JC,No.20200201297JC);吉林省科技厅优秀青年人才基金(No.20170520059JH);吉林省发改委项目基金(No.2019C053-8);吉林省教育厅科研项目基金(No.JJKH20191257KJ)资助

Heterogeneous Information Network Representation Learning Based on Generative Adversarial Network

LIU Xinghong1, WANG Ying1,2,3, WANG Xin3,4, LAN Shumei1,2,3   

  1. 1. College of Computer Science and Technology, Jilin University, Changchun 130012, Jilin, China;
    2. College of software, Jilin University, Changchun 130012, Jilin, China;
    3. Key Laboratory of Symbol Computation and Knowledge Engineering, Ministry of Education, Jilin University, Changchun 130012, Jilin, China;
    4. College of Computer Technology and Engineering, Changchun Institute of Technology, Changchun 130012, Jilin, China
  • Received:2020-08-26 Published:2021-08-04

摘要: 鉴于传统的异质信息网络通常存在的高维稀疏性缺点,首先提出将异质信息网络的高维顶点嵌入低维向量空间的无监督学习模型——基于生成对抗网络的异质网络表征学习(heterogeneous network representation learning based on generative adversarialnetwork,HNRL-GAN)模型;然后分析HNRL-GAN模型中的不足之处,进一步提出改进后的基于生成对抗网络的增强版异质网络表征学习(heterogeneous network representationlearning based on generative adversarial network plus plus,HNRL-GAN++)模型;最后分别在DBLP、Yelp、Aminer等数据集中使用HNRL-GAN模型和HNRL-GAN++模型进行节点分类和节点聚类等实验以测试模型的有效性。实验结果表明:1)HNRL-GAN模型和HNRL-GAN++模型都实现了将异质信息网络中的高维稀疏节点表示为低维稠密向量这一目标;2)相较于HNRL-GAN模型,HNRL-GAN++模型在保留高维空间中网络结构信息和语义信息等方面拥有更好的性能。

关键词: 异质信息网络, 生成对抗网络, 网络表征学习

Abstract: In view of the high-dimensional sparsity shortcomings of traditional heterogeneous information networks, we firstly proposed an unsupervised learning model-heterogeneous network representation learning based on generative adversarial network (HNRL-GAN) that embeds the high-dimensional vertices of heterogeneous information networks into low-dimensional vector spaces. Secondly, having analyzed the shortcomings of HNRL-GAN, we proposed an improved model, called as heterogeneous network representation learning based on generative adversarial network plus plus (HNRL-GAN++). Finally, we used HNRL-GAN and HNRL-GAN++ in three data sets, including DBLP, Yelp, and Aminer, to perform node classification and node clustering for testing the effectiveness of the two models. Experimental results show that: 1) Both HNRL-GAN and HNRL-GAN++ achieve the goal of representing high-dimensional sparse nodes in heterogeneous information networks as low-dimensional dense vectors; 2) Compared with HNRL-GAN, HNRL-GAN++ has better performance in retaining network structure information and semantic information in high-dimensional space.

Key words: heterogeneous information network, generative adversarial network, network representation learning

中图分类号: