基于文本水印的AIGC用户溯源技术

doi:10.3969/j.issn.0255-8297.2025.03.001

Abstract

Abstract: This study addresses the limitations of text watermarking technology in the Chinese language context, and proposes both modified watermarking and generative watermarking schemes for implementation in English and Chinese. Using the Bert model for English and the WoBert model for Chinese, this study designs a portable word substitution watermarking module, which embeds watermarking information by replacing the specified lexical elements in the source text. For generative watermarking, this study adopts the adversarial generative text watermarking model with targeted modifications and migrations on the Chinese corpus, ensuring compatibility with Chinese semantic structures and linguistic conventions of Chinese text. Experiments are conducted using a human-ChatGPT comparison corpus in both Chinese and English. The effectiveness of the proposed watermarking schemes is evaluated based on text watermarking evaluation metrics in terms of both accuracy and semantics. Results demonstrate the proposed methods’ effectiveness in enhancing watermark robustness and traceability in multilingual text.

Key words: text watermarking, pre-trained language model, generative model, comparison corpus

CLC Number:

P751.1

SONG Yimin, LIU Gongshen. AIGC Users Traceability Technology Based on Text Watermarking[J]. Journal of Applied Sciences, 2025, 43(3): 361-369.

References

[1] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need [J]. Advances in Neural Information Processing Systems, 2017, 30: 5997-6008.
[2] Radford A, Wu J, Child R, et al. Language models are unsupervised multitask learners [EB/OL]. [2024-10-30]. https://cdn.openai.com/better-language-models/language_models_ are_unsupervised_multitask_learners.pdf.
[3] Bai Y, Jones A, Ndousse K, et al. Training a helpful and harmless assistant with reinforcement learning from human feedback [DB/OL]. (2022-04-22) [2024-10-30]. http://arxiv.org/abs/2204.05862.
[4] Touvron H, Lavril T, Izacard G, et al. Llama: open and efficient foundation language models [DB/OL]. (2023-02-27) [2024-10-30]. http://arxiv.org/abs/2302.13971.
[5] Black S, Biderman S, Hallahan E, et al. GPT-Neox-20B: an open-source autoregressive language model [DB/OL]. (2022-04-14) [2024-10-30]. http://arxiv.org/abs/2204.06745.
[6] Firdhous M F M, Elbreiki W, Abdullahi I, et al. WormGPT: a large language model Chatbot for criminals [C]//202324th International Arab Conference on Information Technology (ACIT). IEEE, 2023: 1-6.
[7] Liu A, Pan L, Lu Y, et al. A survey of text watermarking in the era of large language models [J]. ACM Computing Surveys, 2024, 57(2): 1-36.
[8] Brassil J T, Low S, Maxemchuk N F, et al. Electronic marking and identification techniques to discourage document copying [J]. IEEE Journal on Selected Areas in Communications, 1995, 13(8): 1495-1504.
[9] Por L Y, Wong K S, Chee K O. UniSpaCh: a text-based data hiding method using Unicode space characters [J]. Journal of Systems and Software, 2012, 85(5): 1075-1082.
[10] Sato R, Takezawa Y, Bao H, et al. Embarrassingly simple text watermarks [DB/OL]. (2023- 10-13) [2024-10-30]. http://arxiv.org/abs/2204.06745.
[11] 刘豪, 孙星明, 刘晋飚. 基于字体颜色的文本数字水印算法[J]. 计算机工程, 2005, 31(15): 129-131. Liu H, Sun X M, Liu J B. Color-based watermarking algorithm for text documents [J]. Computer Engineering, 2005, 31(15): 129-131.(in Chinese)
[12] Topkara U, Topkara M, Atallah M J. The hiding virtues of ambiguity: quantifiably resilient watermarking of natural language text through synonym substitutions [C]//8th Workshop on Multimedia and Security, 2006: 164-174.
[13] Munyer T, Tanvir A, Das A, et al. DeepTextMark: a deep learning-driven text watermarking approach for identifying large language model generated text [DB/OL]. (2023-05-09) [2024-10- 30]. http://arxiv.org/abs/2305.05773.
[14] Abdelnabi S, Fritz M. Adversarial watermarking transformer: towards tracing text provenance with data hiding [C]//2021 IEEE Symposium on Security and Privacy (SP). IEEE, 2021: 121-140.
[15] Sun Z, Du X, Song F, et al. Coprotector: protect open-source code against unauthorized training usage with data poisoning [C]//ACM Web Conference, 2022: 652-660.
[16] Kirchenbauer J, Geiping J, Wen Y, et al. A watermark for large language models [C]//International Conference on Machine Learning, 2023: 17061-17084.
[17] Christ M, Gunn S, Zamir O. Undetectable watermarks for language models [C]//The Thirty Seventh Annual Conference on Learning Theory, 2024: 1125-1139.
[18] Guo B, Zhang X, Wang Z, et al. How close is ChatGPT to human experts? comparison corpus, evaluation, and detection [DB/OL]. (2023-01-18) [2024-10-30]. http://arxiv.org/abs/ 2301.07597.

AIGC Users Traceability Technology Based on Text Watermarking

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments

[1]	WANG Wanling, XIONG Bangshu, OU Qiaofeng, YU Lei, RAO Zhibo. Low-Light Image Enhancement Based on Dark Region Guidance [J]. Journal of Applied Sciences, 2025, 43(2): 245-256.
[2]	CHENG Jiayi, CHEN Lingjiao, WU Yuezhong. Recommendation Algorithm Based on Relative Trust Enhancement [J]. Journal of Applied Sciences, 2025, 43(1): 110-122.
[3]	WANG Hua, HE Qun, TAN Ruchao, WU Dong, FANG Yinuo, MA Yuehui, YAN Kaiquan, MOU Chengbo. High Speed Demodulation System for FBG Current Sensor Based on Mode-Locked Laser [J]. Journal of Applied Sciences, 2024, 42(6): 903-911.
[4]	JIANG Zetao, HUANG Jingfan, ZHU Wencai, HUANG Qinyang, JIN Xin. A Low Light Image Enhancement Method Based on CRTNet [J]. Journal of Applied Sciences, 2024, 42(6): 934-946.
[5]	LU Jing, YOU Chenlu, GAI Qikai, LIU Cong. User Identification Method Using Proximity and Content Features [J]. Journal of Applied Sciences, 2024, 42(6): 1064-1077.
[6]	LU Jing, LI You. Research on Fatigue Failure Model of IGBT Power Module Based on Steady-State Collector-Emitter Saturation Voltage [J]. Journal of Applied Sciences, 2024, 42(6): 1078-1088.
[7]	SHI Zhigang, HUANG Jianhua, LI Tianqi. A Blockchain Scheme for Vehicular Internet of Things [J]. Journal of Applied Sciences, 2024, 42(4): 549-568.
[8]	LIU Kai, WANG Jiaxin, MAO Qian'ang, CHEN Yufei, YAN Jiaqi. Dynamic Role Identification and Evolutionary Analysis of Blockchain Game Ecosystems: A Case Study of Axie Infinity [J]. Journal of Applied Sciences, 2024, 42(4): 642-658.
[9]	WU Kai, GONG Huaping, NI Kai, MAO Bangning, ZHAO Chunliu. Multi-point Tapered Fiber Flexible Curvaturer Sensor Based on Fiber Grating [J]. Journal of Applied Sciences, 2024, 42(2): 237-247.
[10]	WANG Zexu, WEN Bin. Smart Contract Vulnerability Detection of Symbol Execution with Critical Path Pre-searching [J]. Journal of Applied Sciences, 2024, 42(2): 364-374.
[11]	LIU Qing, CHEN Yanping, ZOU Anqi, QIN Yongbin, HUANG Ruizhang. A Multi-label Semantic Calibration Method for Few Shot Extractive Question [J]. Journal of Applied Sciences, 2024, 42(1): 161-173.
[12]	SU Zhan, CHEN Xueqian, AI Jun, HUANG Zhong. Recommendation Algorithm Based on User Similarity Selection and Label Distance [J]. Journal of Applied Sciences, 2023, 41(6): 940-957.
[13]	MA Feihu, WU Yongheng, HU Yun. Study on the Accessibility of Nucleic Acid Sampling Sites in Core City of Nanchang [J]. Journal of Applied Sciences, 2023, 41(6): 1019-1030.
[14]	TAN Ping, SHI Huiyuan, SU Chengli, LI Ping. GORC-PID Algorithm Wireless Temperature Control System with Packet Loss Compensation [J]. Journal of Applied Sciences, 2023, 41(6): 1078-1088.
[15]	YANG Yadong, HUANG Shengyi, TAN Yihua. Infrared Dim and Small Target Detection Algorithm Based on Low-Rank and Reweighted Sparse Representation [J]. Journal of Applied Sciences, 2023, 41(5): 753-765.