Journal of Applied Sciences ›› 2021, Vol. 39 ›› Issue (4): 641-649.doi: 10.3969/j.issn.0255-8297.2021.04.011

• Special Issue on CCF NCCA 2020 • Previous Articles    

Environmental Sound Recognition Based on Attention Sinusoidal Representation Network

PENG Ning1,3, CHEN Aibin1,2,3, ZHOU Guoxiong1,3, CHEN Wenjie1,3, LIU Jing1,3   

  1. 1. Institute of Artificial Intelligence Application, Central South University of Forestry and Technology, Changsha 410004, Hunan, China;
    2. Hunan Key Laboratory of Intelligent Logistics Technology, Central South University of Forestry and Technology, Changsha 410004, Hunan, China;
    3. College of Computer and Information Engineering, Central South University of Forestry and Technology, Changsha 410004, Hunan, China
  • Received:2020-08-23 Published:2021-08-04

Abstract: In this paper, we propose an attention sinusoidal representation network (A-SIREN). Firstly, Mel -frequency cepstral coefficient (MFCC) as an audio recognition feature is extracted from a dataset. Then, feature extraction is performed on each frame of the MFCC by using a neural network named gated recurrent unit (GRU). And audio score is calculated for each frame by using sine function and the audio is re-weighted according to the audio score of each frame. Finally, the categories of environmental sound are discriminated by using the full connection layer in combination with the Softmax classifier. In the experiments of this paper, we validated the designed model in an open-source dataset Urban Sound 8K and compared the performance of the designed model with that of other models. Experimental results show that the A-SIREN works best on the Urban Sound 8K dataset with recognition rate as high as 93.5%.

Key words: environment sound recognition, attention mechanism, Mel-frequency cepstral coefficient (MFCC), gated recurrent unit (GRU), attention sinusoidal representation network (A-SIREN)

CLC Number: