Click-through rate (CTR) prediction is one of the fundamental tasks in recommendation systems. Dual-stream models have been widely adopted in mainstream recommendation frameworks due to their superior flexibility, scalability, and efficiency in information interaction and fusion. To further enhance CTR prediction performance, this paper proposes the FJ hybrid network (FinalBlock-JRC hybrid network, FJHN), which integrates the factorized interaction block (FinalBlock) and the joint ranking and calibration loss optimization algorithm (JRC) based on the structure of the dual-stream model. First, a feature gating layer is introduced to enable differentiated feature inputs, thereby enhancing the representation of important features. Then, FinalBlock is combined with a multilayer perceptron (MLP) to strengthen high-order feature interaction learning. Furthermore, an enhanced interaction aggregation layer is employed to fuse the outputs of each tower, deepening the degree of feature interaction. Finally, an improved JRC mechanism is applied to compute the loss function, which effectively improves the model’s prediction accuracy and adaptability across diverse application scenarios. Experimental results on three publicly available benchmark datasets demonstrate that compared with several mainstream models including self-attention model (SAM), the FJHN model achieves noticeable performance gains in CTR prediction.
[1] Mao K L, Zhu J M, Su L C, et al. FinalMLP: an enhanced two-stream MLP model for CTR prediction [J]. AAAI Conference on Artificial Intelligence, 2023, 37(4): 4552-4560.
[2] Rendle S, Krichene W, Zhang L, et al. Neural collaborative filtering vs. matrix factorization revisited [C]//Fourteenth ACM Conference on Recommender Systems, 2020: 240-248.
[3] Wang Fu B, Fu G, et al. Deep & cross network for ad click predictions [C]//2017 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ADKDD’17), 2017: 1-7.
[4] Guo H, Tang R, Ye Y, et al. DeepFM: a factorization-machine based neural network for CTR prediction [DB/OL]. (2017-03-17) [2024-11-30]. https://arxiv.org/abs/1703.04247.
[5] Lian J X, Zhou X H, Zhang F Z, et al. xDeepFM: combining explicit and implicit feature interactions for recommender systems [C]//24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018: 1754-1763.
[6] Juan Y, Zhuang Y, Chin W S, et al. Field-aware factorization machines for CTR prediction [C]//10th ACM Conference on Recommender Systems, 2016: 43-50.
[7] Liu B, Zhu C X, Li G L, et al. AutoFIS: automatic feature interaction selection in factorization models for click-through rate prediction [C]//26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2020: 2636-2645.
[8] Franklin J. The elements of statistical learning: data mining, inference and prediction [J]. The Mathematical Intelligencer, 2005, 27(2): 83-85.
[9] Lin Z T, Pan J W, Zhang S Y, et al. Understanding the ranking loss for recommendation with sparse user feedback [C]//30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024: 5409-5418.
[10] Li C, Lu Y, Mei Q Z, et al. Click-through prediction for advertising in twitter timeline [C]//21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015: 1959-1968.
[11] Bai A J, Jagerman R, Qin Z, et al. Regression compatible listwise objectives for calibrated ranking with binary relevance [C]//32nd ACM International Conference on Information and Knowledge Management, 2023: 4502-4508.
[12] Yue Y G, Xie Y P, Wu H S, et al. Learning to rank for push notifications using pairwise expected regret [DB/OL]. (2022-01-19) [2024-11-30]. https://arxiv.org/abs/2201.07681.
[13] Zhu J M, Jia Q L, Cai G H, et al. FINAL: factorized interaction layer for CTR prediction [C]//Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023: 2006-2010.
[14] Sheng X R, Gao J, Cheng Y, et al. Joint optimization of ranking and calibration with contextualized hybrid model [C]//29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023: 4813-4822.
[15] Guo H F, Chen B, Tang R M, et al. An embedding learning framework for numerical features in CTR prediction [C]//27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 2021: 2910-2918.
[16] Zhu J M, Liu J Y, Yang S, et al. Open benchmarking for click-through rate prediction [C]//30th ACM International Conference on Information & Knowledge Management, 2021: 2759-2769.
[17] Zhou G R, Mou N, Fan Y, et al. Deep interest evolution network for click-through rate prediction [J]. AAAI Conference on Artificial Intelligence, 2019, 33(1): 5941-5948. 18] Zhou G R, Zhu X Q, Song C R, et al. Deep interest network for click-through rate prediction [C]//24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018: 1059-1068.
[19] Vaswani A, Shazeern, Parmar N, et al. Attention is all you need [J]. Advances in Neural Information Processing Systems, 2017, 30: 5998–6008
[20] Rendle S. Factorization machines [C]//2010 IEEE International Conference on Data Mining, 2010: 995-1000.
[21] Gong J J, Qiu X P, Chen X C, et al. Convolutional interaction network for natural language inference [C]//2018 Conference on Empirical Methods in Natural Language Processing, 2018: 1576-1585.
[22] Li Z K, Cui Z Y, Wu S, et al. Fi-GNN: modeling feature interactions via graph neural networks for CTR prediction [C]//28th ACM International Conference on Information and Knowledge Management, 2019: 539-548.
[23] Cheng Y, Xue Y B. Looking at CTR prediction again: is attention all you need? [C]//44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021: 1279-1287.
[24] Liu T Y. Learning to rank for information retrieval [J]. Foundations and Trends in Information Retrieval, 2009, 3(3): 225-331.
[25] Burges C, Shaked T, Renshaw E, et al. Learning to rank using gradient descent [C]//22nd International Conference on Machine Learning, 2005: 89-96.
[26] Cao Z, Qin T, Liu T Y, et al. Learning to rank: from pairwise approach to listwise approach [C]//24th International Conference on Machine Learning, 2007: 129-136.
[27] Kuo J W, Cheng P J, Wang H M. Learning to rank from Bayesian decision inference [C]//18th ACM Conference on Information and Knowledge Management, 2009: 827-836.
[28] Swezey R, Grover A, Charron B, et al. Pirank: scalable learning to rank via differentiable sorting [J]. Advances in Neural Information Processing Systems, 2021, 34: 21644-21654.
[29] Cheng W Y, Shen Y Y, Huang L P. Adaptive factorization network: learning adaptive-order feature interactions [J]. AAAI Conference on Artificial Intelligence, 2020, 34(4): 3609-3616.
[30] Cheng H T, Koc L, Harmsen J, et al. Wide & deep learning for recommender systems [C]//1st Workshop on Deep Learning for Recommender Systems, 2016: 7-10.
[31] Wang Z, She Q, Zhang J. Masknet: introducing feature-wise multiplication to CTR ranking models by instance-guided mask [DB/OL]. (2021-02-09) [2024-11-30]. https://arxiv.org/abs/2102.07619.
[32] Xiao J, Ye H, He X, et al. Attentional factorization machines: learning the weight of feature interactions via attention networks [DB/OL]. (2017-08-15) [2024-11-30]. https://arxiv.org/abs/ 1708.04617.
[33] Wang R, Shivanna R, Cheng D Z, et al. DCN-M: improved deep & cross network for feature cross learning in web-scale learning to rank systems [DB/OL]. (2020-08-19) [2024-11-30]. https://arxiv.org/abs/2008.13535.
[34] Wang R X, Shivanna R, Cheng D, et al. DCN V2: improved deep & cross network and practical lessons for web-scale learning to rank systems [C]//The Web Conference 2021, 2021: 1785-1797.
[35] Song W P, Shi C C, Xiao Z P, et al. AutoInt: automatic feature interaction learning via self-attentive neural networks [C]//28th ACM International Conference on Information and Knowledge Management, 2019: 1161-1170.