Deep learning has promoted the development of natural language processing technology, and the information hiding methods based on text generation show great potential in this area. This paper proposes a text information hiding method based on neural machine translation, which embeds information in the process of generating translated text. The neural machine translation model uses a Beam Search decoder, which is used to obtain the candidate words in the sentence sequence in the translation process, and to encode the candidate words according to the probability ranking. Then, in the process of decoding and outputting target language texts, corresponding encoded candidate words are selected according to the binary bitstream of secret information, so as to realize information embedding at word level. Experimental results show that compared with the existing text information hiding methods based on machine translation, this method significantly improves the rate of information embedding, and shows good capability and security in anti-steganography performance.
YU Shuangsheng, YANG Zhongliang, JIANG Minyu, HUANG Yongfeng
. Text Steganography Based on Neural Machine Translation[J]. Journal of Applied Sciences, 2020
, 38(6)
: 976
-985
.
DOI: 10.3969/j.issn.0255-8297.2020.06.014
[1] Shannon C E. Communication theory of secrecy systems[J]. Bell System Technical Journal, 1949, 28(4):656-715.
[2] Bender W, Gruhl D, Morimoto N, et al. Techniques for data hiding[J]. IBM Systems Journal, 1996, 35(3):313-336.
[3] Brassil J T, Low S, Maxemchuk N F, et al. Marking text features of document images to deter illicit dissemination[J]//Proceedings of the 12th IAPR International Conference on Pattern Recognition, 1994, 3(2):315-319.
[4] Low S H, Maxemchuk N F, Brassil J, et al. Document marking and identification using both line and word shifting[C]//International Conference on Computer Communications, Boston, MA, USA, 1995:853-860.
[5] Bergmair R. Towards linguistic steganography:a systematic investigation of approaches, systems, and issues[J]. Derby:The University of Derby, 2004.
[6] Atallah M J, Mcdonough C J, Raskin V, et al. Natural language processing for information assurance and security:an overview and implementations[C]//Proceedings of the 2000 Workshop on New Security Paradigms, County Cork, Ireland, 2001:51-65.
[7] Dai W, Yu Y, Dai Y, et al. Text steganography system using Markov chain source model and DES algorithm[J]. Journal of Software, 2010, 5(7):785-792.
[8] Luo Y, Huang Y, Li F, et al. Text steganography based on Ci-poetry generation using Markov chain model[J]. KSⅡ Transactions on Internet and Information Systems, 2016, 10(9):4568-4584.
[9] Luo Y, Huang Y. Text steganography with high embedding rate:using recurrent neural networks to generate chinese classic poetry[C]//Proceedings of the 5th ACM workshop on information hiding and multimedia security. 2017:99-104.
[10] Fang T, Jaggi M, Argyraki K. Generating steganographic text with LSTMs[DB/OL]. arXiv preprint arXiv:1705.10742, 2017. DOI:10.18653/v1/P17-3017
[11] Yang Z L, Guo X Q, Chen Z M, et al. RNN-STEGA:linguistic steganography based on recurrent neural networks[J]. IEEE Transactions on Information Forensics and Security, 2018, 14(5):1280-1295.
[12] Grothoff C, Grothoff K, Alkhutova L, et al. Translation-based steganography[C]//International Workshop on Information Hiding. Heidelberg, Berlin:Springer, 2005:219-233.
[13] Stutsman R, Grothoff C, Atallah M, et al. Lost in just the translation[C]//Proceedings of the 2006 ACM symposium on Applied computing, Dijon, France, 2006:338-345.
[14] Meng P, Shi Y Q, Huang L, et al. LinL:Lost in n-best list[C]//International Workshop on Information Hiding. Heidelberg, Berlin:Springer, 2011:329-341.
[15] Sutskever I, Vinyals O, Le Q V, et al. Sequence to sequence learning with neural networks[J]. arXiv:Computation and Language, 2014.
[16] Bahdanau D, Cho K, Bengio Y, et al. Neural machine translation by jointly learning to align and translate[J]. arXiv:Computation and Language, 2014.
[17] Koehn P, Knowles R. Six challenges for neural machine translation[C]//Meeting of the Association for Computational Linguistics, Vancouver, Canada 2017:28-39.
[18] Press O, Smith N A. You maynot need attention[J]. arXiv preprint arXiv:1810.13409, 2018.
[19] Dyer C, Chahuneau V, Smith N A, et al. A simple, fast, and effective reparameterization of IBM model 2[C]//North American Chapter of the Association for Computational Linguistics, Atlanta, Georgia, USA 2013:644-648.
[20] Papineni K, Roukos S, Ward T, et al. Bleu:a method for automatic evaluation of machine translation[C]//Meeting of the Association for Computational Linguistics, Philadelphia, USA 2002:311-318.
[21] 孟朋. 自然语言信息隐藏与检测研究[D]. 合肥:中国科学技术大学, 2012.
[22] Chen Z, Huang L, Meng P, et al. Blind linguistic steganalysis against translation-based steganography[C]//International Workshop on Digital Watermarking, Berlin, Heidelberg:Springer, 2010:251-265.
[23] Yang Z, Huang Y, Zhang Y J. A fast and efficient text steganalysis method[J]. IEEE Signal Processing Letters, 2019, 26(4):627-631.