Journal of Applied Sciences ›› 2023, Vol. 41 ›› Issue (4): 669-681.doi: 10.3969/j.issn.0255-8297.2023.04.011

• Signal and Information Processing • Previous Articles     Next Articles

Environmental Sound Classification Method Based on Color Channel Feature Fusion

DONG Shaojiang1, XIA Zhengfu1, FANG Nengwei2, XING Bin2, HU Xiaolin2   

  1. 1. School of Mechantronics and Vehicle Engineering, Chongqing Jiaotong University, Chongqing 400074, China;
    2. Chongqing Industrial Big Data Innovation Center Co. Ltd., Chongqing 400707, China
  • Received:2021-09-24 Published:2023-08-02

Abstract: To address low classification accuracy in traditional neural networks processing complex environmental sounds, an environment sound classification method based on color channel feature fusion is proposed. Firstly, three acoustic features are extracted from the raw audio data, namely log-Mel Spectrogram (LMS), Mel-scale frequency cepstral Coefficients (MFCC) and energy spectrum (ES). Then, the above three features are used as RGB color channel components respectively for feature fusion to form a more representative spectrogram, which contributes to representing the environmental sound comprehensively. Subsequently, in order to avoid the poor generalization ability of the trained model due to the small number of datasets, the pre-trained network VGG-16 is trained by fine-tuning method. Finally, the effectiveness of the proposed method is verified on two widely used environmental sound classification datasets and audios collected in real scenarios, and compared with other models in terms of accuracy. The results show that the accuracy of the proposed method on ESC-10 and ESC-50 datasets can reach 88.2% and 65.2% respectively, improving the classification performance of audios collected in real scenarios.

Key words: RGB color channel, feature fusion, fine-tuning training, environment sound classification, pretrained model

CLC Number: