信号与信息处理

基于神经机器翻译的文本隐写方法

展开
  • 1. 清华大学 电子工程系, 北京 100084;
    2. 莱斯大学 计算机科学系, 美国 得克萨斯州 77005

收稿日期: 2019-11-29

  网络出版日期: 2020-12-08

基金资助

国家重点研发计划基金(No.SQ2018YGX210002);国家自然科学基金(No.U1536201,No.U1636113,No.U1705261)资助

Text Steganography Based on Neural Machine Translation

Expand
  • 1. Department of Electronic Engineering, Tsinghua University, Beijing 100084, China;
    2. Department of Computer Science, Rice University, Texas 77005, United States

Received date: 2019-11-29

  Online published: 2020-12-08

摘要

深度学习促进了自然语言处理技术的发展,基于文本生成的信息隐藏方法表现出了巨大的潜力.为此,提出了一种基于神经机器翻译的文本信息隐藏方法,在翻译文本生成过程中进行信息嵌入.神经机器翻译模型使用集束搜索(Beam Search)解码器,在翻译过程中通过Beam Search得到目标语言序列各位置上的候选单词集合,并将候选单词依据概率排序进行编码;然后在解码输出目标语言文本的过程中,根据秘密信息的二进制比特流选择对应编码的候选单词,实现以单词为单位的信息嵌入.实验结果表明,与已有的基于机器翻译的文本信息隐藏方法相比,该方法在隐藏容量方面明显提升,并且具有良好的抗隐写检测性和安全性.

本文引用格式

尉爽生, 杨忠良, 江旻宇, 黄永峰 . 基于神经机器翻译的文本隐写方法[J]. 应用科学学报, 2020 , 38(6) : 976 -985 . DOI: 10.3969/j.issn.0255-8297.2020.06.014

Abstract

Deep learning has promoted the development of natural language processing technology, and the information hiding methods based on text generation show great potential in this area. This paper proposes a text information hiding method based on neural machine translation, which embeds information in the process of generating translated text. The neural machine translation model uses a Beam Search decoder, which is used to obtain the candidate words in the sentence sequence in the translation process, and to encode the candidate words according to the probability ranking. Then, in the process of decoding and outputting target language texts, corresponding encoded candidate words are selected according to the binary bitstream of secret information, so as to realize information embedding at word level. Experimental results show that compared with the existing text information hiding methods based on machine translation, this method significantly improves the rate of information embedding, and shows good capability and security in anti-steganography performance.

参考文献

[1] Shannon C E. Communication theory of secrecy systems[J]. Bell System Technical Journal, 1949, 28(4):656-715.
[2] Bender W, Gruhl D, Morimoto N, et al. Techniques for data hiding[J]. IBM Systems Journal, 1996, 35(3):313-336.
[3] Brassil J T, Low S, Maxemchuk N F, et al. Marking text features of document images to deter illicit dissemination[J]//Proceedings of the 12th IAPR International Conference on Pattern Recognition, 1994, 3(2):315-319.
[4] Low S H, Maxemchuk N F, Brassil J, et al. Document marking and identification using both line and word shifting[C]//International Conference on Computer Communications, Boston, MA, USA, 1995:853-860.
[5] Bergmair R. Towards linguistic steganography:a systematic investigation of approaches, systems, and issues[J]. Derby:The University of Derby, 2004.
[6] Atallah M J, Mcdonough C J, Raskin V, et al. Natural language processing for information assurance and security:an overview and implementations[C]//Proceedings of the 2000 Workshop on New Security Paradigms, County Cork, Ireland, 2001:51-65.
[7] Dai W, Yu Y, Dai Y, et al. Text steganography system using Markov chain source model and DES algorithm[J]. Journal of Software, 2010, 5(7):785-792.
[8] Luo Y, Huang Y, Li F, et al. Text steganography based on Ci-poetry generation using Markov chain model[J]. KSⅡ Transactions on Internet and Information Systems, 2016, 10(9):4568-4584.
[9] Luo Y, Huang Y. Text steganography with high embedding rate:using recurrent neural networks to generate chinese classic poetry[C]//Proceedings of the 5th ACM workshop on information hiding and multimedia security. 2017:99-104.
[10] Fang T, Jaggi M, Argyraki K. Generating steganographic text with LSTMs[DB/OL]. arXiv preprint arXiv:1705.10742, 2017. DOI:10.18653/v1/P17-3017
[11] Yang Z L, Guo X Q, Chen Z M, et al. RNN-STEGA:linguistic steganography based on recurrent neural networks[J]. IEEE Transactions on Information Forensics and Security, 2018, 14(5):1280-1295.
[12] Grothoff C, Grothoff K, Alkhutova L, et al. Translation-based steganography[C]//International Workshop on Information Hiding. Heidelberg, Berlin:Springer, 2005:219-233.
[13] Stutsman R, Grothoff C, Atallah M, et al. Lost in just the translation[C]//Proceedings of the 2006 ACM symposium on Applied computing, Dijon, France, 2006:338-345.
[14] Meng P, Shi Y Q, Huang L, et al. LinL:Lost in n-best list[C]//International Workshop on Information Hiding. Heidelberg, Berlin:Springer, 2011:329-341.
[15] Sutskever I, Vinyals O, Le Q V, et al. Sequence to sequence learning with neural networks[J]. arXiv:Computation and Language, 2014.
[16] Bahdanau D, Cho K, Bengio Y, et al. Neural machine translation by jointly learning to align and translate[J]. arXiv:Computation and Language, 2014.
[17] Koehn P, Knowles R. Six challenges for neural machine translation[C]//Meeting of the Association for Computational Linguistics, Vancouver, Canada 2017:28-39.
[18] Press O, Smith N A. You maynot need attention[J]. arXiv preprint arXiv:1810.13409, 2018.
[19] Dyer C, Chahuneau V, Smith N A, et al. A simple, fast, and effective reparameterization of IBM model 2[C]//North American Chapter of the Association for Computational Linguistics, Atlanta, Georgia, USA 2013:644-648.
[20] Papineni K, Roukos S, Ward T, et al. Bleu:a method for automatic evaluation of machine translation[C]//Meeting of the Association for Computational Linguistics, Philadelphia, USA 2002:311-318.
[21] 孟朋. 自然语言信息隐藏与检测研究[D]. 合肥:中国科学技术大学, 2012.
[22] Chen Z, Huang L, Meng P, et al. Blind linguistic steganalysis against translation-based steganography[C]//International Workshop on Digital Watermarking, Berlin, Heidelberg:Springer, 2010:251-265.
[23] Yang Z, Huang Y, Zhang Y J. A fast and efficient text steganalysis method[J]. IEEE Signal Processing Letters, 2019, 26(4):627-631.
文章导航

/