Journal of Applied Sciences ›› 2013, Vol. 31 ›› Issue (3): 259-265.doi: 10.3969/j.issn.0255-8297.2013.03.007

• Signal and Information Processing • Previous Articles     Next Articles

Subword-Based Position Specific Posterior Lattices for Chinese Spoken Document Indexing

LU Ming-ming, ZHANG Lian-hai, QU Dan   

  1. School of Information Engineering, PLA Information Engineering University, Zhengzhou 450002, China
  • Received:2011-10-14 Revised:2011-12-31 Online:2013-05-28 Published:2011-12-31

Abstract: A spoken document indexing method based on subword-based position specific posterior lattices (SPSPL) is proposed to overcome inconsistency between optimal recognition unit and retrieval unit in the existing Chinese spoken document indexing methods. In the proposed method, a word-based PSPL is generated with a word-based speech recognizer. Each word in the PSPL is replaced by its constituent subword units. According to the posterior probability relationship between each word and its constituent subword units, the original PSPL can be converted to the corresponding S-PSPL to be used in generating a subword-based index for retrieval. Experimental results show that the new method can make use of a well-trained language model, and avoid incorrect segmentation in the word-based recognizer as well. Better performance is obtained compared to the current indexing methods that use words as both recognition and retrieval units.

Key words: spoken document retrieval, spoken document indexing, subword-based position specific posterior lattices, lattice, subword posterior probability

CLC Number: