Journal of Applied Sciences ›› 2024, Vol. 42 ›› Issue (6): 1064-1077.doi: 10.3969/j.issn.0255-8297.2024.06.014

• Computer Science and Applications • Previous Articles     Next Articles

User Identification Method Using Proximity and Content Features

LU Jing, YOU Chenlu, GAI Qikai, LIU Cong   

  1. School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
  • Received:2022-12-01 Online:2024-11-30 Published:2024-11-30

Abstract: Social networks restrict access to user topology, which greatly reduces the accuracy of identification methods using structure features. We present proximity and content based User Identification based on XGboost, a semi-supervised network model that integrates attribute, structural and content features to transform the cross-social network user identification problem into a binary classification task. To tackle the challenge of incomplete topology information and insufficient seed users, a method for extracting explicit and implicit friends is proposed. Friend networks are fused according to explicit friends, implicit friends and other friends in the friend network of the user pair to be matched. The user’s importance is combined, so as to improve empirical probability of second order proximity of LINE algorithm and obtain the structure feature. We then extract time sequence features, keyword overlapping features, and followee tag feature as the content features. Finally, these features are fused to complete user identification. Experiments on real datasets show the effectiveness of this method.

Key words: social network, user identification, proximity, XGBoost, user generated content

CLC Number: