[1] Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R S, Bengio Y. Show, attend and tell:neural image caption generation with visual attention[J]. Computer Science, 2015:2048-2057. [2] Fu K, Jin J, Cui R, Sha F, Zhang C. Aligning where to see and what to tell:image captioning with region-based attention and scene-specific contexts[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017, 39(12):2321-2334. [3] Li S, Kulkarni G, Berg T L, Berg A C, Choi Y. Composing simple image descriptions using web-scale n-grams[C]//Proceedings of the Fifteenth Conference on Computational Natural Language Learning, 2011:220-228. [4] Mitchell M, Han X, Dodge J, Mensch A, Goyal A, Berg A, Daume I H. Midge:generating image descriptions from computer vision detections[C]//Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, 2012:747-756. [5] Kulkarni G, Premraj V, Ordonez V, Dhar S, Li S, Choi Y, Berg T L. BabyTalk:understanding and generating simple image descriptions[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(12):2891-2903. [6] Yang Y, Teo C L, Daume H, Aloimonos Y. Corpus-guided sentence generation of natural images[C]//Conference on Empirical Methods in Natural Language Processing, 2011:444-454. [7] Elliott D, Keller F. Image description using visual dependency representations[C]//Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 2013:1292-1302. [8] Kuznetsova P, Ordonez V, Berg A C, Berg T L, Choi Y. Collective generation of natural image descriptions[C]//Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics:Long Papers-Volume 1. Association for Computational Linguistics, 2012:359-368. [9] Kuznetsova P, Ordonez V, Berg T L, Choi Y. Treetalk:composition and compression of trees for image descriptions[J]. Transactions of the Association for Computational Linguistics, 2014(2):351-362. [10] Karpathy A, Li F F. Deep visual-semantic alignments for generating image descriptions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015:3128-3137. [11] Mao J, Xu W, Yang Y, Wang J, Huang Z, Yuille A. Deep captioning with multimodal recurrent neural networks (m-RNN)[C]//ICLR, 2015:1412-1423. [12] Cho K, van Merrienboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation[J]. Computer Science, 2014:1406.1078. [13] Vinyals O, Toshev A, Bengio S, Erhan D. Show and tell:a neural image caption generator[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015:3156-3164. [14] 张宇,张鹏远,颜永红. 基于注意力LSTM和多任务学习的远场语音识别[J]. 清华大学学报(自然科学版),2018, 58(1):249-253. Zhang Y, Zhang P Y, Yan Y H. Long short-term memory with attention and multitask learning for distant speech recognition[J]. Journal of Tsinghua University(Science and Technology), 2018, 58(1):249-253. (in Chinese) [15] Chen L, Zhang H, Xiao J, Nie L, Shao J, Liu W, Chua T S. SCA-CNN:spatial and channel-wise attention in convolutional networks for image captioning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017:5659-5667. [16] Lu J, Xiong C, Parikh D, Socher R. Knowing when to look:Adaptive attention via a visual sentinel for image captioning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017:375-383. [17] Anderson P, He X, Buehler C, Teney D, Johnson M, Gould S, Zhang L. Bottom-up and top-down attention for image captioning and visual question answering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018:6077-6086. [18] 李亚超,熊德意,张民. 神经机器翻译综述[J]. 计算机学报,2018, 41(12):2734-2755. Li Y C, Xiong D Y, Zhang M. A survey of neural machine translation[J]. Chinese Journal of Computers, 2018, 41(12):2734-2755. (in Chinese) [19] Vinyals O, Toshev A, Bengio S, Erhan D. Show and tell:lessons learned from the 2015 mscoco image captioning challenge[J]. IEEE Transactions on Pattern Analysis and Machine, 2017:39(2), 652-663. [20] 王红,史金钏,张志伟. 基于注意力机制的LSTM的语义关系抽取[J]. 计算机应用研究,2018, 35(3):1417-1420. Wang H, Shi J X, Zhang Z W. Text semantic relation extraction of LSTM based on attention mechanism[J]. Application Research of Computers, 2018, 35(3):1417-1420. (in Chinese) [21] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate[C]//3rd International Conference on Learning Representations, San Diego, May 7-9, 2015. [22] Papineni K, Roukos S, Ward T, Zhu W J. BLEU:a method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 2012:311-318. [23] Banerjee S, Lavie A. METEOR:an automatic metric for MT evaluation with improved correlation with human judgments[C]//Proceedings of the aclWorkshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, 2015:65-72. [24] Lin C Y. Rouge:a package for automatic evaluation of summaries[C]//Proceedings of the ACL-04 Workshop on Text Summarization Branches Out, Barcelona, 2004:74-81. |